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Introduction 


THE RESEARCH worker in psychology and 
education is frequently confronted with the prob- 
lem of analyzing differences between Several 
groups of individuals with respect to several var- 
iables. In almost all such problems the investi- 
gator has either resorted to the use of analysis 
of variance which resulted in the analysis of sev- 
eral groups with respect to one variable or he re- 
sorted to the use of correlational techniques which 
resulted in the analysis of one group with respect 
to several variables. However, since about 19- 
35, multivariate statistical tools have been avail- 
able permitting the simultaneous analysis of dif- 
ferences between several groups with respect to 
several variables. See Moonan (8) for a brief 
discussion of the origin and development of mul- 
tivariate theory. 

This investigation is concerned with the use 
of multivariate statistical tools in analyzing dif- 
ferences between five groups of college students 
following pre-professional curricula with respect 
to four commonly used measures of college apti- 
tude. Essentially the purpose of this analysis is 
two-fold: (1) to determine what differences exist 
between these groups, and (2) to utilize these dif- 
ferences for predictive purposes. 


Population, Sampling, and Variables 





Population. — The population was defined as 
Students matriculating in the College of Science, 
Literature and the Arts at the University of Min- 
nesota with the following restrictions: 


1. They followed a pre-business, pre-law, 
pre-medical, or special medical science 





* The author wishes to acknowledge the helr 
Johnson. 





curricula (occupational therapy, physical 
therapy, medical technology, nursing) pro- 
gram. 
. They completed at least three quarters of 
work. 
3. They had no previous college work. 
. They had no military service. 
. They entered college in the fall quarter of 
1949 or 1950. 
5. They had complete test data available. 


Sampling. —-Two samples were selected, one 
from students who entered the college in the fall 
quarter of 1949 and one from students who enter- 
ed the college in the fall quarter of 1950. The 
first sample was used in the main analysis, and 
the second sample was used for cross-validation 
purposes. 

All the blueprints of students who had entered 
the college in the fall quarter of 1949 and 1950 
were examined, and the students meeting the re- 
quirements listed above were selected. On the 
basis of courses completed, the two samples 
were divided into the following five groups: 


Pre-business (bus, ) 
. Pre-law (law) 
. Special medical science curricula (S.M.S.) 
. Pre-medical (med. ) 
. Unsuccessful (uns. ) 


All the students in the first four groups had 
at least a C average for three quarters. The last 
group (unsuccessful) consisted of students regis- 
tered in the above-mentioned programs who did 
not maintain a C average for three quarters. 


of his advisors, Drs. W. W. Cook andP. O. 
Dr. Johnson has been especially helpful in carrying out this investigation. 
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Variables, —-The following variables or meas- 
ures were used: 


1. American Council on Education Psycholog- 
ical Examination, 1947 (ACE) 

2. Ohio State University Psychological Test, 
Form 22 (Ohio) 

3. Cooperative English Test, Form S (Eng. ) 

4. High school percentile rank transformed 
to probit values (HSR) 


Statistical Analysis 


Tests of Assumptions. —The statistical tools 
used in this investigation are based on the assump- 
tion that the samples are random samples from 
a normal multivariate population. Since no single 
over-all test exists to determine if the samples 
came from a normal multivariate population, it 
iS necessary to consider important properties of 
the normal multivariate distribution and to deter - 
mine if the samples conform to these properties. 
Mahalanobis and Rao (7) selected the following 
properties of the normal multivariate distribu- 
tion as necessary but not sufficient conditions for 
a multivariate distribution to be of the normal 
type: (1) normality of distribution of each vari- 
able; (2) homogeneity of variances; (3) homogene- 
ity of covariances; and (4) linearity of regression. 

Each of the four variables was tested singly 
to determine if they conformed to the above prop- 
erties by methods illustrated by Johnson (5). The 
data did not meet completely all the assumptions 
underlying multivariate analysis. 

Hypotheses of normality of distribution were 
rejected in the following instances: ACE at the 
one percent level for the Bus. and Law groups, 
Ohio at the five percent level for the S.M.S. and 
Law groups, and Eng. at the one percent level 
for the Med. and Law groups. Hypotheses of 
homogeneity of variances were rejected at the 
one percent level for Eng. and HSR. Hypotheses 
of homogeneity of covariances were rejected for 
ACE-HSR and Ohio-HSR at the one percent and 
five percent level respectively. No significant 
departures from linear regression were found. 

The first concern is, of course, what effect 
this might have on drawing valid conclusions. 

As far as the author knows, no direct work has 
been done to determine the effects of not fulfill- 
ing all the assumptions in multivariate analysis. 
Although the extent of the bias is not known, the 
non-fulfillment of some of the assumptions should 
be considered in drawing conclusions. 

Over-all Test of Significance of Differences 
Between Groups. —It is first necessary to test 
whether the observed over-all differences be - 
tween groups are significantly greater than 
chance before proceeding to apply the D? -statis- 
tic to pairs of groups. If the observed differences 
between the groups are not significant, the D? - 
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statistic is not applicable. The definition of D? 
S given in the section ‘‘Generalized Distance 
Between the Groups. "’ 

If we were dealing with a single variable, the 
differences between the groups could be tested 
by the analysis of variance. In the case of anal- 
ysis of variance, the total sums of squares are 
partitioned into ‘‘between’’ and ‘‘within’’ sums 
of squares and the means squares of these are 
compared. When dealing with more than one 
variable, both sums of squares and cross prod- 
ucts matrices are partitioned into the ‘‘within’’ 
matrix and the ‘‘between’’ matrix. This is called 
analysis of dispersion and stems from the work 
of Bartlett (1), Rao (10), and Wilks (16), and its 
application has recently been illustrated by Rao 
and Slater (15). 

Let Xjjr be the r’th observation on the i’th 
variable of an individual for the j'th group. Let 
Xj; represent the mean of the i’th variable for 
thé j'th group. Let Xj represent the mean of the 
i’th variable for all of the groups. 


The ‘‘total’’ sums of squares and cross prod- 
ucts matrix is defined as: 


{sin} 


(Xijr ~ Xi(Xhjr - Xp) (1) 


7 9 
) 9° 


i; Beh SE ssesy @ 


This matrix consists of elements represent- 
ing sums of squares of deviations from means 
and cross products for all the individuals based 
on all the variables disregarding groups. It has 
(N - 1) degrees of freedom. 

The ‘‘within’’ sums of squares and cross prod- 
ucts matrix is defined as: 


{ Ain}= = 


© (Xijr - Xij(Xpjr ~Xnj) (2) 
a 


Sy! Sere 


This matrix consists of elements represent- 
ing sums of squares of deviations from means 
and cross products within each group and based 
on all the variables. It has (N - 5) degrees of 
freedom. 

Similarly, we have a ‘‘between’’ sums of 
squares and cross products matrix with four de- 
grees of freedom. 

Computationally it may be easier to find the 
‘*total’’ and ‘‘between’’ matrices and subtract 
each element of the ‘‘between’’ matrix from each 
element of the ‘‘total’’ matrix to obtain the ‘‘with- 
in’’ matrix. However, if all three matrices are 
found independently, ‘‘between’’ plus ‘‘within’’ 
equals ‘‘total’’ serves as a check on the compu- 
tations. 
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A summary of the totals and means for each 
variable for each group is given in Table I. The 
‘‘total’’ sums of squares and cross products are 
summarized in Table II, the ‘‘between’’ in Table 
Ill, and the ‘‘within’’ in Table IV 

To test the hypothesis that no differences be- 
tween the means of the groups exist, Bartlett's 
(1) approximation was used. 


x2 = -{n-1/2(p+2+1)} loge V (3) 


Where n is the degrees of freedom of the ‘‘with- 
in’’ matrix, q is the degrees of freedom of the 
‘*‘between’’ matrix, p is the number of variables, 
and 


v- Determinant of the ‘‘within’’ matrix {Ain} (4) 
Determinant of the ‘‘total’’ matrix {S jh} 





Chi-square table is entered with p. q degrees 
of freedom. 

In the present problem, the following results 
were obtained: 





y- ‘Within’ matrix, Table IV 
‘*Total’’ matrix, Table II 


= .58185 
and 


x? = {339 - 1/2(4+ 44 1)} .54154 = 181.15 


7 

This value was entered in the Chi-square table 
with 16 degrees of freedom, and p was less than 
.001. Therefore, the hypothesis was rejected. 

In ether words the result of this test provides ev- 
idence that groups of students following different 
curricula vary significantly with respect to the 
four selected measures. 

This result does not give evidence on the rel- 
ative differences between various pairs of groups 
or on what combinations of abilities are involved 
in successful achievement in different curricula. 
The remainder of the analysis attempts to provide 
information on these matters. 

Generalized Distance Between the Groups. — 
Before discussing D? consider the definition. 

Let {ajp) represent the ‘‘dispersion’’ matrix, 
and {aih} represent the inverse of the ‘‘dispersion’’ 
matrix. The ‘‘dispersion’’ matrix is the ‘‘with- 
in’’ matrix, as defined previously, with each ele- 
ment divided by the number of degrees of free- 
dom. Let dj represent the difference between 
——— of two groups for the i’th variable; i= 
1, 2, 3, 4. 





D? =z 3 aih di dp, (5) 
1 


Methods for determining D? values which do 
not involve computation of inverses have been 
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developed by Rao (12, 14). His methods trans- 
form the original variables into a set of standard- 
ized, uncorrelated variables. Then D? is simply 
a sum of squares of differences. 

Consider the five groups and the four variables 
as described earlier. Geometrically, the vari- 
ables can be regarded as defining a space of four 
dimensions and an individual can be represented as 
a point in that space on the basis of the four meas- 
ures. Individuals belonging to any group can be 
represented as a cluster of points around a point 
defined by the means of the four variables for 
that group. Then the distance between the means 
of any two groups, as defined in this space, isa 
measure of the amount of overlap or the diverg- 
ence between the two groups. The distance be- 
tween any pair of groups can be compared with 
the distance between any other pair of groups. 

Extensive and detailed consideration and ap- 
plication of the D? -statistic have recently been 
carried out by Rao and Slater (15) and Mahalanobis 
and Rao (7). Much of the discussion and applica- 
tions in this paper stems from their work. 

The ‘‘dispersion’’ matrix is summarized in 
Table V, and the inverse of the dispersion matrix 
is given in Table VI. 

Using formula number 5, D? values are com- 
puted for all the possible combinations of pairs 
of groups, and the results are summarized in 
Table VII. 

The significance of the D? values was tested 
by the following ratio: 


Fe 9192 .1.n-p+l. ps 
ny + ng n p 


Where nj and n2 are the numbers in the two 
samples, n is the degrees of freedom of the ‘‘dis- 
persion’’ matrix, and p is the number of vari- 
ables. 

The F table is entered with p and (n - p+ 1) 
degrees of freedom. 

This test of significance is due to Hotelling’s 
(4) work regarding the test of significance of the 
Generalized T and points up the relationship be- 
tween the D?-statistic and T statistic. 

In all cases but one, pre-business and pre - 
law, the D® values were significant at the one 
percent level. Although the difference between 
the pre-business and pre-law groups might have 
been significant with larger samples, the non - 
significant result is not surprising, in view of the 
overlap of course in the two curricula. 

The D? values in Table VII are measures of 
the magnitude of divergence or distance between 
the groups. Therefore, we can say that the pre- 
business group resembled the pre-law and unsuc- 
cessful groups more than the pre-medical group 
and the S.M.S. group. The pre-law group re- 
sembled the pre-medical group more than the 
S.M.S. and unsuccessful groups. The S.M.S. 
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TABLE Ill 


THE SUMS OF SQUARES AND CROSS PRODUCTS MATRIX BETWEEN ALL 
GROUPS WITH 4 DEGREES OF FREEDOM 





ACE Ohio Eng. , HSR 
14, 680. 96 18, 158. 67 11, 454. 17 774, 26 





18, 158. 67 32,235. 42 25, 430. 05 1, 450. 65 
11,454. 17 25, 430. 05 24,209. 75 1,290. 60 


774. 26 1, 450. 65 1,290. 60 71. 82 





TABLE IV 


THE SUMS OF SQUARES AND CROSS PRODUCTS MATRIX WITHIN ALL 
GROUPS WITH 335 DEGREES OF FREEDOM 





ACE Ohio Eng., HSR 
94, 228. 73 68, 614. 97 75,918.77 1, 460. 20 





68, 614. 97 137,113. 41 104, 996. 42 1, 629. 48 
75,918. 77 104, 996. 42 153, 969. 66 1, 853. 29 


1, 460. 20 1,629. 48 1, 853. 29 221. 05 





TABLE V. 


THE DISPERSION MATRIX (ajh) 





ACE Ohio Eng. , 
281. 2798 204. 8208 226. 6232 





204. 8208 409. 2938 313. 4221 
226. 6232 313. 4221 459. 6109 


4. 3588 4. 8641 5. 5322 
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and pre-medical groups resembled the unsuccess- 
ful group least of ali. Other comparisons could 
also be made. In the next section the configura- 
tion of the groups is represented graphically. 

Dimensions of Variation Between the Groups. — 
As stated earlier, N individuals with four meas- 
ures on each can be represented as N points in 
a space of four dimensions. The problem isfirst 
of all determining whether the variation between 
the groups extends significantly in more than 
one direction or, to state it differently, does one 
component account for all the variation? 

If all four dimensions or components are not 
Significant, then we have the problem of determ- 
ining the best representation of the N points ina 
space of less than four dimensions—or it is a 
problem of obtaining linear combinations of the 
variables which will give the best representation. 
Rao (11) gives a solution to this problem interms 
of maximum scatter, and it involves finding the 
canonical variates. He points out that his solu- 
tion is similar to Hotelling’s principal compon- 
ents. 

The methods used here have been illustrated 
by Rao and Slater (15), who acknowledged the 
work of Fisher (3). 

Determining the number of dimensions involves 
finding the roots of a determinantal equation, Table 
Vill. The determinantal equation was obtained by 
pre-multiplying the ‘‘between’’ matrix (Table III) 
by the inverse of the ‘‘dispersion’’ matrix (Table 
VI) and subtracting « from the diagonal values. 

Expanding Table VIII, the following equation 
was obtained: 





ud - 210.29 uw + 10,024.99 2 - 63,512.72 p+ 
691.12 = 0 


The four roots of this equation were equal to: 
143.52, 59.31, 7.45, and .01. Note that the 
first two roots absorb 96. 4 percent of the varia- 
tion. 

To test the significance of the roots, the sum 
of the roots and each root arranged in descend- 
ing order of magnitude can be regarded as ** with 
the following degrees of freedom: 


p(s-1)=(p+s~-2)+(p+s~-4)+(p+s-~-6)+ (p+s-8) 


where p is the number of variables and s is the 
number of groups. 

The first and second roots were significant at 
the one percent level, and the <* value of the 
third root had a probability lying between five and 
ten percent. The sum of the roots can be used 
as an over-all test of differences between groups, 
and in this problem it was significant at the one 
percent level. 

The above result is evidence of variation in 
two dimensions. The third root, although not 
Significant at the five percent level, was includ- 
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ed in the remainder of the analysis. This makes 
possible the representation of the groups inthree 
dimensions, and this may give a better configur- 
ation of the groups than representation in just 
two dimensions. 

In order to represent the data in a space of 
three dimensions, the direction cosines (ky, ko, 
k3, kq) corresponding to the three roots were 
found, standardized, and compounded into linear 
functions %f the variables X;, X2, X3, Xq4. These 
linear functions are called canonical variates, 
and it is necessary to compute a canonical vari- 
ate for each of the three roots. 

The values of ky, kp, kg, kg were obtained 
from the system of equations at top of page 229. 

The terms (mjp) of these equations were ob- 
tained from the determinantal equation, Table 
VIII, and y denotes a root. By putting kg = 1 arb- 
itrarily the proportional values of k,, ko, kg were 
found. 

To standardize the functions, each kj was di- 
vided by the square root of the quantity 


Ps 
ist hei 2! 


where ajp denotes the elements of the ‘‘disper- 
sion’’ matrix. 

The resulting linear function is called a can- 
onical variate: 


The three canonical variates for this problem 
were as follows: 


A, = -. 0086 X; + .0277 Xy + .0021 Xz + .9017X, 
Ap = ~. 0681 X; - . 0191 Xp + . 0539 Xz + . 1656 Xy 


Ag = . 0380 X; - .0661 Xq + . 0252 X3 + . 5059 Xy 


The mean canonical variates for each group 
were obtained by substituting the mean values of 
the ACE, Ohio, Eng. , and HSR for Xj, X2, X3, 
and X4. The results are summarized in Table 
IX. 

Utilizing the values in Table IX, the configur- 
ation of the groups is represented in Figure 1. 

Let us consider what these three canonical var- 
iates or dimensions represent. Since each dimen- 
sion is a linear compound of the four original var- 
iables, it may not be possible or meaningful to 
name them. However, by inspecting the equation 
for the canonical variate, the dimension can be 
described in terms of the loadings of the original 
variables, i.e., the weights attached to the vari- 
ables in making up the canonical variate. 
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TABLE VI 
THE INVERSE OF THE DISPERSION MATRIX (ath) 





ACE 
ACE . 006515 -. 001591 -. 001947 





Ohio -. 001591 . 005555 ~. 002933 - . 005846 
Eng. , -. 001947 ~. 002933 . 005251 - , 009540 


HSR -. 014981 ~. 005846 -. 009540 1. 737518 


TABLE VII 
SUMMARY OF D? VALUES FOR ALL COMBINATIONS OF GROUPS 





Med. 
1. 2477 





. 6507 


S.M. 
Med. 


Uns. 


*Not significant at the 5 percent level; all other valves significant at the 1 percent 
level. 





TABLE VIII 


DETERMINANTAL EQUATION 


32. 8556-p - 4.2269 - 32.3062 - .8524 
39. 3926 67. 1105-p 44. 4883 2.6213 


- 29. 0838 - 10,2074 17. 9255-y . 3295 





909. 9269 1817. 4445 1691. 2207 92. 3965-y 
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ky (my 4-4) k2m12 


kym2) ko(m29-H) 


k)m3) komg2 


kjm4} k2m42 


The first dimension has heavy positive loadings 
on the Ohio and HSR. The second dimension has 
a heavy positive loading on Eng., and a heavy neg- 
ative loading on ACE. And the third dimension 
has approximately equal positive loadings on the 
Eng., HSR, and ACE, but it has a negative load- 
ing on Ohio. As stated previously, each of the 
dimensions has some loadings on all of the vari- 
ables. 

With the descriptions of the dimensions in 
mind, consider the configuration of the groups 
in Figure 1. Most of the variation between the 
groups occurs on the first dimension. Onthis 
dimension the unsuccessful group ranks lowest, 
then pre-business, pre-law, and S.M.S., and 
the pre-medical group ranks highest. A high 
rank on this dimension means relatively high 
scores on HSR and Ohio. 

Considerable variation between the groups al- 
so occurs on the second dimension, but much of 
this variation is due to the position of the S.M.S. 
group. A high rank on this dimension is due to 
a relatively high score on Eng., and a relatively 
low score on ACE. 

Although the third dimension shows the least 
amount of variation between the groups, three of 
the groups, pre-business, pre-law, and unsuc- 
cessful, vary as much on this dimension as they 
do on the second dimension. A high rank on this 
dimension is due to high scores on the Eng., HSR, 
and ACE, and a low score on the Ohio. 

Figure 1 also gives an over-all picture of the 
divergence or distance between the groups. Com- 
parisons of distances between various combina- 
tions of groups can be done more readily from 
Figure 1 than from the table of D® values. 

Linear Discriminant Functions. —-The prob- 
lem here is finding a statistical criterion to de- 
termine the group to which an individual belongs. 
Lubin (6) in a summary of recent developments 
of discriminant functions points out that a serious 
weakness in the usual approach to discriminant 
functions is that adequate discrimination can only 
be made if the groups lie on a linear continuum. 
If only one linear discriminant function is used 
in classifications involving more than two classes, 
the assumption of collinearity is made. 

Rao (9) avoids this difficulty in the extension 
of discriminant functions to more than two groups 
and does take into account the patterning of the 
variables. Rao’s solution involves finding a lin- 
ear discriminant score for each group. See Rao 
(13) for an extensive discussion of inference ap- 
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plied to classificatory problems. 


The linear discriminant score for the i’th 
group: 


Ly=1Xy+1gXgtlgXgrlgXqn1/221Xjj-logeM) (8) 


where 1j = {ain} {x;; } 


and, = the relative frequency of the number of 
individuals in the j’th group. 

The linear discriminant functions for this prob- 
lem are summarized in Table X. The term loge 
Tf; was based on the total enumeration of the first 
sample. 

In order to determine to which group an indi- 
vidual belongs, five linear discriminant scores, 
one for each group, are computed on the basis of 
his scores on the ACE, Ohio, Eng., and HSR. 
He is assigned to the group for which the discrim- 
inant score is the highest. 

Cross-Validation. —The effectiveness of the 
discriminant functions was tested by applying 
them to a different sample from the one on which 
the discriminant functions were based. The se- 
lection of this sample was described earlier. The 
results are summarized in Table XI. 

In this table the columns represent the ob- 
served and the rows represent the predicted; the 
number of correct classifications are the diagon- 
al values. 

The number of individuals correctly classified 
and the number of correct classifications expect- 
ed by chance (based on marginal totals) for each 
group are listed below. 





Expected 
Classification by Chance 


Bus. 8 7. 87 
Law 6 1. 49 
S.M.S. 15 7. 02 
Med. 19 10. 21 
Uns. 30 12. 76 


Total 78 39.35 


Correct 


To test whether the number of correct classi- 
fications was greater than could be expected by 
chance, an approximate t test was computed. 
This yielded a t value of 6.94, which is clearly 
significant at the one percent level with 188 de- 
grees of freedom. Predictions were not made 
with the same accuracy for all groups. 





JOURNAL OF EXPERIMENTAL EDUCATION 


TABLE Ix 


MEAN VALUES OF THE FIRST, SECOND AND THIRD CANONICAL 
VARIATES FOR EACH GROUP 





A} 


Pre-business —(i‘éé#éCT~« OR: «2 5OG © 7. 2590 
Pre-law 7. 4734 2.2641 6. 9978 
Special medical science 7. 8406 3. 4654 7.3279 
Pre-medical 8.1135 2.0914 7. 4522 


Unsuccessful 6. 3834 2. 4682 7.3954 


TABLE X 


THE LINEAR DISCRIMINANT SCORES FOR THE GROUPS 








; Coefficients of Measurements — ) Constant Terms 
X} X2 


KCE Ohio. Eng, wSR) «2 “1/2,2 


pti Xij + loge 


1518 -.3723  .5689 ~~ 5:: | -70.9556  -1. 4765 





1550 ~. 3399 . 546 5 ~71. 1045 -1. 9689 
0827 . 3744 : 6. 7900 -79. 7401 -1. 8490 
. 1786 . 3489 : 6. 8696 -79. 0267 -1. 5694 


. 1657 4003 ; 5. 3474 -66. 9053 -1.3224 


TABLE XI 


CLASSIFICATION OF INDIVIDUALS WITH THE LINEAR 
DISCRIMINANT FUNCTIONS 








S.M.S. Med. Uns. Total 





0 11 3 37 

0 2 0 10 

6 33 

9 19 48 


6 2 60 





‘Total 40 28 40 188 
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These results corroborate earlier findings in 
the study. However, it is questionable just how 
useful the linear discriminant functions would be 
in a practical situation. The results show prom- 
ise when the relatively small mean difference and 
considerable overlap of the groups are taken into 
account. 

Equivalence of the Two Samples. —As stated 
earlier, two samples were selected, one from 
Students who entered college in 1949 and one from 
students who entered college in 1950. For valid 
application of the results of this study, it is es- 
sential to test for the equivalence of the two 
samples. To State it differently, it is essential 
to test whether students entering college during 
a particular year differ significantly from stu- 
dents entering college some other year. 

The significance of the differences between 
the means for the two samples was tested by us- 
ing the D? -statistic as described previously. 

The D? value was equal to . 0506 and F was 
equal to 1.52, with 4 and 523 degree of freedom, 
which is not significant at the five percent level. 
Therefore the hypothesis was accepted that the 
two samples did not differ significantly from 
chance variation. 

This test serves to define more rigorously the 
population for which conclusions in this study are 
to be drawn. 





Summary and Conclusions 





This problem dealt with the analysis of differ- 
ences, with respect to four variables, between 
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five groups of college students. The five groups 
selected for study were pre-business, pre-law, 
pre-medical, special medical science, and unsuc- 
cessful students, and the variables selected were 
scores on the American Council on Education Psy- 
chological Examination, Ohio State University Psy- 
chological Test, Cooperative English Test and 

the high school percentile rank. Multivariate sta- 
tistical techniques were applied to the data. 

The statistical analysis consisted of the follow- 
ing: (i) tests of assumptions, (2) over-all test of 
significance of difference between groups, (3) gen- 
eralized distance between groups, (4) dimensions 
of variations, (5) linear discriminant functions, 

(6) cross-validation, and (7) test of equivalence 
of the two samples. 

Significant D? values were obtained for all pairs 
of groups except pre-business and pre-law. The 
groups could have been represented in a space of 
two dimensions, but a third dimension was includ- 
ed for general interest. When the linear discrim~ 
inant functions were applied to a new sample, clas- 
sifications significantly better than chance were 
made. The equivalence of the two samples was 
established. 

Two approaches to further research on this 
problem are recommended: 


1. Investigations to determine if more homo- 
geneous groups could be made. Perhaps it would 
be more fruitful to classify in terms of specific 
types of courses instead of curricula. 

2. Investigation of additional variables to de- 
termine their discriminating value. 
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SOME APPLICATIONS OF THE METHOD OF 
PIVOTAL CONDENSATION IN STATIS- 
TICAL ANALYSIS 


RAYMOND O. COLLIER, Jr. 
University of Minnesota 


Introduction 


THE EXISTENCE of a particular method of 
statistical analysis is only one of the necessary 
conditions for its utilization in any applied field. 
Equally important is the ease of calculation of the 
quantities involved in the method. The major 
purpose of this paper is that of considering the 
latter problem in connection with the Generalized 
Distance Statistic and related quantities. Pivotal 
condensation, the calculation technique to be illus- 
trated, was introduced by Rao (3,4,5). It is in- 
tended, therefore, that this paper in part be a 
clarification of Rao’s original work. 


Pivotal Condensation and the Generalized Dis- 
tance Statistic 





An application and discussion of the role of the 
Generalized Distance Statistic in multivariate dis- 
criminatory analysis has been made by Moonan 
(2). It will be recalled that the basic problem in 
such an analysis is essentially as follows. We de- 
sire to classify as accurately as possible certain 
individuals or items into various qualitative cate- 
gories on the basis of a priori information—the 
measurements on several variables of previous 
individuals who were members of these categories. 
In particular, Moonan was concerned with study- 
ing individuals who were members of the United 
States Senate. The variables employed were three 
groups of political bills. The measurements on 
these variables consisted of composite scores rep- 
resenting manner of voting and party affiliation. 
Qualitative categories considered were four geo- 
graphical sections of the country. 

One phase of the analysis usually involves the 
calculation of D?, an estimate of the distance be- 
tween the means of pairs of population categories. 
These means, termed multivariate means are 
based on several variables instead of one alone. 
Tests of significance of these D?’s are performed 
in order to determine if the distances between 
the population means actually exist. 

D? = 54 Dp S!” dij dpj (1) 
where S! - the inverse matrix of Sih, the disper- 
sion matrix (the variance-covariance matrix 


| 
| 





formed by pooling the sums of squares and 
cross products within groups). 


dij = (Xjj - Xij"), the difference between the 


two sample means of the ith variable for the 
jth pair of groups (categories). 


Since the inversion of a variance-covariance 
matrix composed of many variables is prohibitive- 
ly laborious, it is fortunate that two other meth- 
ods exist for the computation of D? which do not 
require the explicit inversion of a dispersion ma- 
trix, namely (a) a transformation of the correl- 
ated variables to a new orthogonal set, and (b)a 
calculation of the relationship 


ay | 





Both of these methods may be performed by the 
technique of Pivotal condensation, Fundamentally, 
pivotal condensation is a process of reducing an 
array of numbers to a form with desirable charac- 
teristics. These characteristics differ depending 
on the statistical quantity required in the analysis. 
To accomplish a reduction by pivotal condensation, 
the elements of a certain row are multiplied by a 
factor such that subtraction of the result from 
other rows yields a column of zeros. This concept 
will be clarified in subsequent illustrations. Let 
us consider first method (a). Reference to form- 
ula (1) allows us to make certain statements rela- 
tive to the advisability of transforming to an orth- 
ogonal set of variables. On transforming, the co- 
variance elements in Sjp become zero and only di- 
agonal elements, the variances of the transformed 
variables, remain. If furthermore, the variables 
are standardized (by dividing each by its corres- 
ponding standard deviation), the variances reduce 
to unity. D#? is then simply expressed as D? = 
x *d\j?, where *d\; represents the difference be- 
tween the jth pair of sample means for the ith var- 
iable, standardized and transformed. If onthe 
other hand the variables have not been standard- 
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*dij? 
*Sii 
of the transformed ith variable, is found easily in 
the process of pivotal condensation. 

This transformation to an uncorrelated set of 
variables, expressed algebraicly may be found in 
(4) and although a more efficient process than the 
formula (1) for a large panel of variables is ac- 
complished even more efficiently by pivotal con- 
densation. 

Let us carry through the transformation on data 
from Moonan’s application, his dispersion matrix, 
Sih, is given as: 


ized, D? = 2 , in which *Sjj, the variance 
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2.5425 .1707 -.2682 


.1707 .5575 -.0670| (3) | 


~.2682 -.0670 .2477 | 


The transformation is accomplished by append- 
ing a unitary matrix to the original dispersion ma- 
trix and reducing as far as possible by pivotal con- 
densation. Since (3) is symmetrical there is no 
need for repeating elements below the diagonal el- 
ements. The actual reduction is given in Table I. 

From Table I we may obtain the transforma- 
tion formulas: 


Y, =X, 
Y2 = Xs . 0671 X, 
Y; = X; + . 0897 X, + .0995 X, 


Also available are the variances of the trans- 
formed variables. These have been underlined 
in Table I. Thus *S,,, *Sg2, *Ss33 are 2. 5425, 
.5460, and .2150, respectively. 

Note that we have carried two check columns. 
Both the first and second check column entries 
for the first three rows consist of the sums of 
these rows. The remaining entries in check col- 
umn one are made by operating on the appropriate 
previous entries as indicated in the operation col- 
umn. The second check column entries are made 
by summing across the rows including elements 
which are omitted because of the symmetry of 
Sih- 

Computation of D*® for two groups from these 
transformed variables is, as previously indicated, 


(4) 


since we have used unstandardized variables. 
Table II and Table III give the differences in 
original and transformed means of the variables 
between the various group pairs. 
D?’s for each of the pairs of groups have been 





calculated and are presented in Table IV. 
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It is interesting at this point to note that in the 

formula D? = > *dij each *d4j represents the 
*Sii *Sii 

contribution of the ith variable to the total D? if 
the order of the variables in Sjh is unaltered. 

Calculation technique (b) remains to be pre- 
sented. This method is the more information 
eliciting technique of the two treated. Not only 
is the value of D?, between two groups given, but 
also the contribution of each variable on which D? 
is based is obtained. Consequently, it will be pos- 
sible to test the significance of each of the added 
variable contributions or combinations thereof. 

Treating the same data as before, we arrange 
the elements of Sjj, djj' and djj as follows: 


[Ss 


s . . | 
3. 


Siz Sis dy; | 


S22 


Sas dsj 


dij day dsj 0 | 


The reduction of this array by pivotal conden- 
sation will give values of D%,, i.e., D? based on 
p variables of the jth group pair. For example, 
Dj is the D? based on the first variables, D3 is 
the D? based on the first and second variables, 
etc. Proceeding with an actual application in 
Table V, observe that the dispersion matrix, Sjh, 
for the three variables was taken from (3). The 
differences between the means on each variable 
(factors) of the different groups (sections) taken 
pair-wise may be read from Table II. 

Instead of using (5) and calculating D®’s for 
the jth group pair alone, one may list dj,, dj2, 
etc. , in successive columns. 

In Table V, D?, Dg, and D3 are the values of 
D? based on the first, the first two, and all three 
variables for each of the six pairs of groups. We 
may use the following test for testing the signif- 
icance of the contribution of adding q variables 
to the D? on p variables. 


n, Ng (Dig - D)) 


f (n, + ny) + nyngDe 


T- @-p-a+ 1) | 
q 


where f = the degrees of freedom of the disper - 
sion matrix of variances and covariances between 
the ith and jth variables 

p= number of variables in Di, 


q = number of variables added to form Dj,,q 


n, = Size of sample from the first population 
in D? considered 
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TABLE If] 


DIFFERENCES IN TRANSFORMED MEANS 


(Vol. XXI 








*di2 *di3 *di4 





-1. 5392 -1. 3409 2. 3233 


. 3368 . 2218 - .1243 


- , 0867 . 1739 . 3173 





TABLE IV 


VALUES FOR D? BETWEEN PAIRS OF GROUPS 





South West Midwest 
-East -East 


Midwest 
-South 


-West 





1.1745 _- 9379 2.6196 1.7913 
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TABLE VI 


CALCULATION OF 1;’s FOR SOUTH GROUP 





South Group 
Operation ‘ Xj 
X, . 1707 . 26 . 5517 
Xe . 5575 ‘ . 5517 
Xy a . 0000 


(1) +2. 5425 1. 0000 . 0671 ; . 0036 
(2) 1707 x (4) . 5460 - _ 04 . 3804 


(3) - (-. 2682) « (4) af . 2692 


(5) +. 5460 . 0000 . 08S . 6967 
(6) - (-. 0490) X (7) g . 3033 


(8) +. 2150 . 0000 5. 0619 
Subst. (9) in (7) . 2405 = 
Subst. (9) and (10) in (4) . 5599 = 


TABLE VII 
REDUCTION OF DETERMINANT M 


Columns 


Operation 2 


=(1) +5 
= (2) - 3 x (4) 
(3) - 6 x (4) 


(5) +14/5 
(6) - (-2/5) x (7) 
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No size of sample from the second popula- 
tion 


Rao (3) has shown that T is F-distributed with 
q and f - p- q+ 1 degrees of freedom. 

From Table V, we may also calculate the dis- 
criminant coefficients, using information already 
calculated. The discriminant equation for any 
group in our example is given by: 

3 
Lj = 1,X, + 12X2 + 1sX, -} 


1S 14Xj;) + loge Ti, (6) 
2 ic! iAij Be '') 


where lj 


© si x; for j= 1, 2, 3 
j 


uF = the relative frequency of individuals in the 


jth group, assumed to be known. 


The 1j's, the discriminat coefficients, com- 
puted from (6) requires again the inverse of Sih. 
This author has found it practicable to compute 
these coefficients by using the information of 
Table V. The computation is accomplished by re- 
placing the djj column by Xjj._ In practice one 
may Simply lay a strip of paper over the dij col- 
umns and enter the means for the various groups 
in the same order and position as the differences 
had been entered. As an example, the calculation 
of the 1;'s for the discriminant equation of Moon- 
an's South group is shown in Table VI. 


The Evaluation of a Determinant by Pivotal Con- 
densation 


Up to this point we have illustrated the method 
of pivotal condensation as it applies to the gen- 
eralized distance statistic. Let us now consider 
a problem which arises in many types of analysis 
—the evaluation of a determinant. Pivotal con- 
densation is equally efficient here. Our purpose 
will be to reduce the original determinant to a 
triangular determinant with zeros below the diag~- 
onal elements. Thus the value of the determinant 
is simply the product of the diagonal elements. 

Suppose one desires to evaluate determinant 
M: 


l 


2 


3] 


The actual reduction of the determinant prepara- 
tory to its evaluation is shown in Table VII. 

Thus far we have reduced the array by succes- 
sive reduction so that if we were to write the re- 
sultant determinant it would appear as 


Collier 


0 


0 


Upon expansion by elements of the first col- 
umn, the value of this determinant is 


M = 5 x 14/5 x 2 = 28. 


In pivotal condensation the underlined elem- 
ents 5, 14/5, and 2 are termed pivotal elements. 
The general procedure then for determinantal 
evaluation is to reduce to a triangular determin- 
ant by pivotal means and evaluate by calculating 
the product of the pivotal elements. In practice 
it is advisable to incorporate check columns as 
was previously done. 


Inversion of a Matrix 


The concept of pivotal condensation is used in 
a Slightly different manner for the inversion of 
a matrix. To the matrix for which aninverseis 
sought, we append a unitary matrix. We proceed 
in the reduction process, but toward the objec- 
tive of reducing the original matrix to a unitary 
matrix. Thus for all types of matrices we must 
operate both above a given pivotal row and below 
that row. (A pivotal row is the row used for re- 
ducing various elements. ) 

Let us apply the method to the elements of the 
previous determinant considered now as a ma- 
trix. 


The calculations are indicated in Table VIII. 

From rows (10), (11), and (12) we may obtain 
the original matrix, R, reduced to a unitary ma- 
trix and the original appended unitary matrix 
transformed to the inverse of R. Thus the in- 
verse of R is 


10/35 -5,35 0 


Rl./ 15/140 45/140 -1/4 


-45 70 5/70 1/2 


Summary 


Application of the method of pivotal condensa- 
tion has been made to certain quantities in dis- 
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criminatory analysis, in determinantal evaluation, 


and the inversion of a matrix. In the case of dis- 


criminatory analysis the data from an actual prob- 


lem (in accurately classifying individuals into 
qualitative categories on the basis of several var- 
iables) were utilized. It is hoped that these illus- 
trations will prove helpful to workers in the field. 
The author has found that in the main they are un- 
usually efficient and easily performed. 
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TEST INSTRUCTIONS AND SCORING METHOD 
IN TRUE-FALSE TESTS 


EVAN R. KEISLAR 
University of California 
Los Angeles, California 


Introduction 


IN EDUCATIONAL institutions where informal 
true-false tests are still widely used, disagree- 
ment exists as to the advisability of using thecor- 
rection formula whereby a student’s score is the 
number of items he gets right minus the number 
he gets wrong. It has been well established that 
a correction formula should be used in scoring 
true-false tests where time limits are such that 
most students do not have time to finish the test.1 
But for those tests where most students have 
enough time to answer every item the use of the 
correction formula depends on the test instruc- 
tions. If the instructions to the student are ‘‘Do 
not guess, ’’ the correction formula should be 
Used. 2 On the other hand, if the instructions are 
to answer every item, it makes little difference 
whether the score is number right or number 
right minus wrong. Assuming the students follow 
instructions, the correlation between the two sets 
of scores would be 1. 00. 

Studies generally agree that ‘‘Do not guess’”’ 
instructions (correction formula being used) yield 
reliabilities which are somewhat higher than 
‘‘Guess”’ instructions.3 But it has been argued 
that ‘‘Do not guess’’ instructions place students 
with certain personality characteristics at a dis- 
advantage. Votaw4 found, for example, that sub- 
missive students (as defined on Allport’s A-S Re- 
action Study) under ‘‘Do not guess”’ instructions 
omitted more items than the ascendant students. 


1. Harold Gulliksen, Theory of Mental Tests 
pp. 245-248. 





J. W. Dunlap, A. DeMellow, and E. E. Cureton, 








(New York: John Wiley and Sons, 1950), 


When asked to give their answers to the items 
they left blank, such students improved their 
scores much more than did the dominant students. 
Votaw concluded that the use of ‘‘Do not guess’’ 
instructions and the scoring formula results in 
the measurement of some personality trait or re- 
sponse set unrelated to achievement. He infers 
that while the use of ‘‘Do not guess’’ instructions 
increases reliability, the validity is decreased. 
The work of Soderquist 5 and Swineford 6 also in- 
dicates the possibility that ‘‘Do not guess’’ in- 
structions result in the measurement of some per- 
sonality variable such as the ‘‘tendency to gamble.’’ 

Contamination of the achievement test score 
by such personality factors seems likely if the 
test instructions contain a threatening element. 
For example, the set of instructions used in one 
investigation of the merits of the right-minus- 
wrong method of scoring tests was in part as fol- 
lows: 


If you are in doubt about the answer to 
any question, leave it blank. Do not guess! 
You will be penalized for all wrong answers. 
The tests are scored in such a way that you 
will lose more than you gain by guessing. 7 


Other instructions frequently found where the cor- 
rection formula is to be employed italicize or 
capitalize the words: ‘“‘DO NOT GUESS'"’ or 
‘‘GUESSING WILL BE PENALIZED.” Faced with 
this situation certain students may well leave it- 


‘'The Effects of Different Directions 


and Scoring Methods on the Reliability of the True-False Tests, School and Society, 


(Septembef 14, 1929), pp. 378-382. 


Ww. W. Cook, 





"*achievement Tests,'' in Encyclopedia of Educational Research, Walter 
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David F. Votaw, 


Revised Edition(New York: Macmillan Co., 1950), pp. 


1461-1478, 


‘'The Effect of Do-Not-Guess Directions Upon the Validity of True- 


False or Multiple Choice Tests,'' Journal of Educational Psychology XXVII (December 





1936), pp. 696-703. 


Harold 0. Soderquist, 


‘ta New Method of Weighting Scores in a True-False Test,'' 


Journal of Educational Research, XXX (December 1936), pp. 290-292. 
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ems blank which they could answer correctly bet- 
ter than chance. If these students are thus pen- 
alized for not being willing to gamble, the valid- 
ity of the test with threatening ‘‘Do not guess"’ 
instructions is subject to some question. In view 
of this, many authorities 8 recommend that the 
instructions should merely indicate to the student 
how the test is to be scored; nothing is said as to 
whether he should guess or not. It is the aim of 
the present study to evaluate the use of the cor- 
rection formula with this type of instruction. 


The Problem 


The purpose of this investigation was to com- 
pare the results of a true-false test administered 
to upper division college students when the test 
is given and scored under the following two con- 
ditions: Method R. Students are given instruc- 
tions that the test will be scored number right; 
they are not told to guess. Method R-W. Students 
are given instructions that the test will be scored 
number right minus number wrong; nothing is 
said about not guessing. The two methods are to 
be compared with respect to (1) reliability com- 
puted by the split-half method, and (2) validity by 
estimating the extent to which students are pen- 
alized by leaving items blank in Method R-W. 


The Procedure 


To obtain a comparison between the above 
two types of scoring methods and test instructions 
it is desirable that the subjects, the testing situa- 
tion, and the test items be held constant. Conse- 
quently, the procedure adopted was to have agroup 
of students answer all the items on one test 
according to both methods during the same sit- 
ting. To ensure that the usual test motivation 
was operative for each method, students were 
informed that a coin would be tossed at the end 
of the examination to decide which method of scor- 
ing would be used for the purposes of course eval- 
uation. 9 The wording of these instructions for 
the true-false test follows: 


You are to answer each of the follow- 
ing true-false items according to two dif- 
ferent methods. Record your answers on 
the special answer sheet with the proper 
pencil. Spaces 1 and 2, opposite the 
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number for each item, are reserved for 
the first method; spaces 4 and 5 are re- 
served for the second method. Answers 
given according to the first method will 
be scored on the basis of number right; 
answers given according to the second 
method will be scored on the basis of 
number right minus number wrong. Af- 
ter the examination is over a coin will 
be tossed to decide which one of these 
two methods will be used to score this 
examination. 


For the first method, scored number 
right, use spaces 1 and 2 on the answer 
sheet opposite the number of the item. 

If the item is true blacken space 1; ifthe 
item is false blacken space 2. Your score 
on the examination according to this meth- 
od will be simply the number of items you 
get right. Items not answered will be 
counted as wrong answers. 


For the second method, scored num- 
ber right minus number wrong, use 
spaces 4 and 5 on the answer sheet op- 
posite the number of the item. If the 
item is true blacken space 4; if it is 
false blacken space 5. Your score on 
the examination according to this meth- 
od will be the number of items you get 
right minus the number you get wrong. 
Items not answered will be counted 
neither right nor wrong. 


It should be noted that the response set induced 
by this experimental design is not only a function 
of the non-directive instructions offered here 
but also of the ‘‘Do not guess”’ type of instruc- 
tions which the students have undoubtedly encoun- 
tered in previous tests where the correction form- 
ula has been used. As such the study probably 
represents a fairly typical condition which most 
instructors would face in using this type of in- 
structions. 

The subjects were 108 students enrolled inan 
upper division caurse in educational psychology. 
This test design was used for the first two exam- 
indtions in the course, the first examination con- 
sisting of 65 true-false items, the second of 72 
true-false items. The two examinations were 





Guess' Instruction in Multiple Response Tests,” Journal of Educational Psychology, 
XVII (September 1926), pp. 368-375. 





8. C. C. Ross, Measurement in Today's Schools, Second Edition (New York: Prentice-Hall 





Inc., 1947), p. 119. 


9. Both examinations were scored number right. 


Students were given their scores but 


were not allowed to see the papers for the first test until after the second examin- 


ation. 
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scheduled four weeks apart during the first half 
of the semester. 


For each student scores were obtained for the 
following variables: the number right on Method 
R of scoring each examination, hereafter referred 
to by Ry and Ro2 for the first and second examin- 
ations respectively; the number right minus 
wrong on Method R-W of scoring each examina- 
tion, (R-W), and (R-W)9; the number of items 
left blank on Method R-W, By and Bg; the num- 
ber of blank items on Method R-W which were 
marked right on Method R; and the number of 
such items marked wrong. 


Results 


1. Reliabilities of the two methods: As is in- 
dicated in Table I, the corrected reliability of 
R-W is higher than that of R for each examina- 
tion. The significance of the difference between 
these two coefficients for each examination must 
be computed by taking into consideration the fact 
that the reliability correlation coefficients are 
themselves correlated. Since this can be under- 
taken for the corrected reliability coefficients 
only by using estimated figures, it is perhaps 


safer to compute the significance of the difference 
between the obtained uncorrected reliability fig- 


ures. The computation of rrr and the significance 
of the difference between the corresponding z's 
is indicated for each examination in Table II. 

These results show that for Examination I the 
difference between the uncorrected reliabilities 
is not significant but for Examination II the dif- 
ference is significant at the .002 level. It is 
reasonable to infer that the significance of the 
difference between the corrected reliability fig- 
ures is as high as or higher than that for the un- 
corrected. 10 

2. Relationship between R and R-W: The cor- 
relations between R and R-W indicated in Table 
III would suggest that under these test conditions 
the two methods are so similar in their results 
that it would make little difference in practice 
which methods were used. The fact that the re- 
liability of the R-W method appears to be higher 
is probably not enough to warrant a preference 
for its use under these testing conditions. 


10. The significance of the difference between 
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3. Number of blank items: The average num- 
ber of items left blank in Method R-W on the first 
examination was 11 percent of the total and on 
the second examination 8 percent (Table I). This 
is in spite of the fact that the tests were relative- 
ly difficult, as shown by mean scores. The dis- 
tribution of B for each examination is skewed 
with most individuals omitting relatively few it- 
ems and some omitting a large number. The re- 
liability of this response set is very high for each 
examination. The moderate correlation between 
Bi and Bg (Table III) suggests that this response 
set has a certain stability over the four-week 
period as related to the examinations in this 
course. The fact that there is no significant cor- 
relation between the number of items left blank 
and either one of the measures of achievement, 

R or R-W, offers no support to the thesis that 
good students omit more or fewer items than poor 
students. While this thesis is not disproved, the 
relationship if it does exist is slight. 


4. Penalization of students who leave items 
blank on Method R-W: If students who hold a re- 
sponse set to omit items are penalized by not 
showing how much they know, the validity of the 
R-W method is lowered. To test this hypothesis 
of penalization, the third of the class who left 
the most items blank were selected for each ex- 
amination. For each individual an improvement 
score was calculated, this index being definedas 
the number of items omitted on Method R-W which 
were correctly marked on Method R minus the 
number of such items marked incorrectly. Since 
this score represents the improvement which 
would have occurred had the individual left no 
items blank, it indicates the amount of penaliza- 
tion by the R-W method. For Examination I the 
mean improvement score of the 36 individuals 
was -.97 with a standard deviation of 3. 85; the 
group selected for Examination II showed a mean 
of -.28 with a standard deviation of 3.05. Neither 
mean differs significantly from zero. We may, 
therefore, not reject the hypothesis that the indi- 
viduals who left many items blank were not un- 
fairly penalized as a group. 


There is a possibility that these mean improve- 
ment scores while low represent a relatively good 
showing in comparison with the other students in 
the class. To test this hypothesis for each exam- 
ination, two matched groups of 24 individuals 


the corrected reliability coefficients 


for Examination I was computed by estimating the intercorrelations among the four 


scores which two forms of the entire test would yield. 


This involves correcting the 


intercorrelations among the split halves for the fact that the longer tests reduce 


attenuation. 
to the method followed in Table II. 


The test of significance of the difference was carried out according 
On this basis the difference between the cor- 


rected reliabilities of R and R-W for Examination I was found to be significant at 


the .05 percent level. 
siderable approximation error. 


Admittedly the use of such estimated figures involves con- 
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TABLE II 


COMPARISON OF SPLIT-HALF RELIABILITIES OBTAINED BY METHOD R 
AND METHOD R-W ON EXAMINATIONS I AND II 





Examination II 


. 599 134 
. 691 294 


Examination I 


. 453 . T12 
. 949 . r13 
. 466 ri4 
. 489 . Z19 





a4 
a 
~~ Bwh 


| 
a 8 8 © 
nn 8 8 


xn 
_— 


a a a 
Triok 34 = .90 "riot 34 = .946 
= 042” = 032” 


o - 0 - 
212 ~ 234 712 ~ 734 


z -zZ z -£ . 098 
~12 34 = 1.1 (P=.27) —i2__—"34, agg = 3-1 (P=.002) 


212-234 © * O42 "212-234 


it a (13 ~ Ty2T23)(ra4 ~ T2334) 
"12°34 ay rip) - F8,) [ 


+ (Taq ~ 1y3T34)(23 - Ty2713) + (13 ~ T14T34)(F2q ~ 12714) 


+ (14 - Ti2F 2423 - r24r3q) | 


b* [219-234 = / 2-2r219234 where Tz19254 |S assumed to be equal 


N-3 


t0 Friorg4 





Subscript Notation: Figures 1 and 2 refer to the odd and even split-halves 
of R (Method R) respectively. Figures 3 and 4 refer to the odd and even 
split-halves of R-W (Method R-W) respectively. 


*Formulas for a and b are taken from Quinn McNemar, Psychological Sta- 
tistics (New York: John Wiley and Sons, 1949), p. 125. 





TABLE Ill 


INTERCORRELATIONS AMONG R, R-W, AND B ON 
EXAMINATIONS I AND II 
(N = 108) 





Ry and Rp 57 R, and By 


(R-W); and (R-W)9 . 63 Rg and Bo 
B, and Bo . 67 (R-W), and By 
R, and (R-W); . 96 (R-W)o and Bo 


Rp and (R-W)o .99 





*Not significant at the . 05 level. 
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apiece were formed. One member of each match- 
ed pair was selected from those with the highest 
number of blank items, the other from those who 
left blank two items or less. They were matched 
on the basis of R scores. For each member ofa 
pair a R-W score was computed on the items omit- 
ted by the member with the high blank score. The 
difference between these two scores was then ob- 
tained. 

Results show that the mean difference in scores 
between the paired members for the first examin- 
ation was 2.0, the standard deviation being 6. 9, 
in favor of the group with low blank scores. For 
the second examination the mean difference was 
1.4, the standard deviation being 5.4, again in 
favor of those with low blank scores. At the .01 
level of confidence we may reject the hypothesis 
that a difference larger than 1.9 could be obtain- 
ed in favor of the group with the high blank scores. 
This same limit is 1.7 for the second examina- 
tion. These limits represent a standard score 
of less than .2 for either test. The average num- 
ber of blank items on which these scores were 
computed was 16 for the first examination and 15 
for the second. We can therefore be fairly con- 
fident that if the R-W method is penalizing stu- 
dents who leave items blank, the amount of pen- 
alization in relation to others in the class is cer- 
tainly small. There is a greater possibility on 
the contrary that such students as a group do not 
know the answers to the items left blank better 
than chance and thus are not being penalized at 
all, We may conclude that if students omit items 
in accordance with some personality trait, the 
effect on the R-W scores is certainly very slight 
under the test conditions of this study. This may 
also be inferred by the high correlations between 
the two methods of giving the test. 

5. The reliability and validity of improvement 
scores: It may be argued that although the indi- 
viduals who leave many items blank do not as a 
group do better than chance on these items, these 
persons are consistent in their improvement or 
lack of it. If this is true, the improvement score 
aS previously defined must be reliable. The evi- 
dence on this question is inconclusive. The un- 
corrected split-half reliability of the improve- 
ment score for the 36 individuals with the largest 
number of blank scores was . 27 for the first ex- 
amination and . 01 for the group selected for the 
second examination. The uncorrected split-half 
rpliability of the improvement score for the en- 
tire class was _. 12 with a standard deviation of 
2 9 for the first examination and -. 07 witha stand- 
aéd deviation of 2.2 for the second examination. 
Winile none of these correlations differs signifi- 
c4jatly from zero, it should be noted that the im- 
pyovement score constitutes in effect the result 
ofta short test of variable length (an average of 
14 and 13 items for the entire class on the first 


' ‘ 
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and second examinations respectively). As such, 
it could not be expected to show high reliability. 
The confidence limits are large enough to prevent 
one from rejecting the possibility that these items 
considering their small number constitute a reli- 
able measure of achievement, especially in the 
case of the first examination. 

The question as to whether good students are 
penalized more than poor students by the R-W 
method raises the problem of validity of the im- 
provement score. With such low obtained reli- 
abilities for this score, it is obvious that the ev- 
idence on this matter is insufficient. The R 
score may not be used as a criterion of validity 
since it is spuriously correlated with the improve- 
ment score (the latter being part of the former). 
The R-W score, while less unsatisfactory, is in- 
adequate since the good and poor students, if the 
improvement score is valid, do not get the scores 
they should. It is not surprising therefore that 
the correlations between the improvement score 
and R-W are for the entire class . 01 and .00 for 
the respective examinations. For the 36 individ- 
uals with the highest blank scores the correspond- 
ing correlations are .15 and . 01. 

Since the number of items left blank were rel- 
atively small for the class as a whole, it is not 
reasonable to expect positive findings with the 
improvement score. To test-wise college stu- 
dents these non-directive instructions did not 
cause them to omit many items in spite of the 
possible operation of sets to the right minus 
wrong method previously acquired under more 
threatening types of test instructions. 


Summary and Conclusions 





This study was designed to discover the merits 
of using the correction formula in true-false tests 
at the college level when students are informed 
only as to how the test would be scored. Twotrue- 
false examinations, administered four weeks apart 
in an upper division course in educational psychol- 


ogy, were taken by 108 students. For each exam- 
ination the subjects were requested to give two 
sets of answers to each item, one set with instruc- 
ions only that these answers would be scored 
number right, the other set with instructions only 
that the answers would be scored right minus 
wrong. The same test motivation was used for 
the two sets of answers by informing the students 
that a coin would be tossed to decide which one 
of the two sets would be used for the purpose of 
assigning grades. 

The following conclusions are to be interpre- 
ted as applying to those situations involving a 
comparable group of students and test materials. 
With less sophisticated students or with true - 
false items of lower difficulty, for example, dif- 
ferent results might have been obtained. 
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1. The use of the correction formula with this 
type of instructions gives results so highly similar 
to the number right method (r’'s equal . 96 and .99 
for the two examinations) that in actual practice 
it would make little difference which method were 
used. 

2. The uncorrected split-half reliability of the 
right minus wrong method of scoring was found 
to be slightly but significantly higher than that of 
the number right method in the second examina- 
tion. For the first examination the difference in 
reliability was in the same direction but was not 
significant. 

3. The average number of items left blank on 
the right minus wrong method was 11 percent for 
the first examination and 8 percent for the sec- 
ond. This response set to leave items blank 
showed high reliability for each examination (cor- 
rected r’s were .91 and . 95) and a fair degree of 
consistency over the four-week period between 
the examinations (r = .67). No significant rela- 
tionship was found between the number of omitted 
items and the test scores on either method for 
either examination. 

4. The third of the students who left blank the 
largest number of items gave as a group answers 
which could be accounted for on the basis that 
they were sheer guesses. If these students were 


being penalized at all for not showing what they 


knew on the right minus wrong method, we can 
be fairly confident that the amount of penaliza- 
tion for this group in comparison with those who 
omitted few items is less than a standard score 
of .2 on either test. If the response set to omit 
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items is taken as the measure of some personal- 
ity trait, it seems fairly certain that the effect 
of this personality characteristic is small under 
these test conditions. 

5. The evidence regarding the consistency of 
penalization (or the amount of improvement had 
all items on the R-W method been answered) was 
ambiguous largely because of the small number 
of items left blank by most students. While the 
hypothesis may be accepted that the variance con- 
tributed by these items is entirely error variance, 
we may not reject the hypothesis that scores de- 
rived from the answers given to the blank items 
are somewhat reliable and valid. 


6. Evidence elsewhere indicates that the at- 
tempt to reduce the amount of sheer guesses on 
true-false tests by strong ‘‘Do not guess’’ instruc- 
tions may cause students to omit responses which 
from the standpoint of test validity should be ans- 
wered. On the other hand, instructions like those 
used in this study which do not attempt to influ- 
ence the student to omit items may reduce the 
number of sheer guesses so slightly that the ad- 
vantage of the correction formyla is lost for prac- 
tical purposes. Some merit may be found in the 
practice of wording instructions so as to encour- 
age students to answer on the basis of geod 
‘‘hunches’’ and at the same time to avoid wild 
guessing. But whether these instructions are do- 
ing what is desired needs to be studied on the pop- 
ulation to whom the test is given. Otherwise scor- 
ing on the basis of number right would appear to 
be quite satisfactory. 








AN HISTORICAL VS. CONTEMPORARY PROB- 
LEM SOLVING USE OF THE COLLEGE PHYS- 
ICAL SCIENCE LABORATORY PERIOD 
FOR GENERAL EDUCATION 


JAMES S. PERLMAN 
Moorhead State Teachers College 
Moorhead, Minnesota 


1. The Problem and Its Significance 


THE EXPERIMENTAL study summarized here 
involved a comparison of an historical as against 
a contemporary problem-solving use of the college 
physical science laboratory period for general ed- 
ucation purposes. For this experimental objec- 
tive, the following three null hypotheses were se- 
lected for test: 


a. That scientific thinking could not be learned, 
specifically, that devotion of college phys- 
ical science laboratory time to teaching con- 
sciously for scientific attitudes and abilities 
would not result in significant gains in these 
objectives. 


. That there would be no significant differ - 
ences in scientific problem-solving out- 
comes between the particular case-history 
and the contemporary problem-solving uses 
of the laboratory period developed in this 
Study. 


. That there would be no differences in prob- 
lem-solving outcomes and subject matter 
achievement of these problem-solving meth- 
ods as against a good lecture-demonstration 
method without restrictions as to historical 
or contemporary apparatus. 


Past investigations on effective use of the sci- 
ence laboratory period have been limited in scope. 
They have been primarily concerned with the is- 
sue of the individual laboratory versus lecture- 
demonstration. This research has been conduct- 
ed in general with outcomes limited to achieve- 
ment in subject matter content and with experi- 
mental designs and statistical techniques now ob- 


| 
| 








solete. With rare exceptions, evaluation interms 
of scientific thinking has not existed. The unique- 
ness of the science laboratory lies in the oppor- 
tunities it provides for first hand material and 
first had evidence in a larger picture of problem- 
solving. Its possibilities in this direction have 
remained uninvestigated and unevaluated. This 
becomes all the more serious when one notes that 
in order to save expense and effort, present col- 
lege physical science programs in general educa- 
tion are almost entirely devoid of individual lab- 
oratory periods. 


2. Experimental Designs 


The experiment consisted of two sections, a 
primary and a secondary, with a separate design 
for each section. 

The primary design was a 2 X 2 randomized 
block based on considerations for randomization, 
replication and local control, the three pre-req- 
uisites for a self-contained experiment. Random- 
ization, to the extent realized, enabled use of 
the analysis of variance and covariance of R. A. 
Fisher. 1 Replication afforded, as Johnson2 em- 
phasizes, ‘‘a valid estimate of experimental er- 
ror.’’ This study realized replication by provid- 
ing two groups of students for each experimental 
method. Local control enabled the experiment 
to have its own basis of comparison and of con- 
clusions. The contrast of the two problem-solv- 
ing methods, the historical and the contemporary, 
gave this study its local control. 

The analysis of variance used is the analytical 
process of breaking down the total sum of squares 
of variation from the total mean into component 
parts. These component sum of squares of vari- 
ation were then identified with appropriate sources 
and converted into mean squares through use of 





1. R. A. Fisher, 
1946). 


2. Palmer 0. 


The Design of Experiments, Second Edition 


Johnson, Statistical Methods in Research (New York: Prentice-Hall, 


(London: Oliver and Boyd, 
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proper degrees of freedom in each case. Since 

this analysis involves an assumption of equal var- 
iability within each of the groups in respect to the 
particular characteristic measured, the Welch’s 

L-test of homogeneity of variances, as described 
by Johnson was applied. 

The analysis of covariance was used in order 
to determine and to remove such‘effects as differ- 
ences between the groups due to general college 
aptitude and initial ability in scientific problem- 
solving from differences in final achievements 
that might otherwise be attributed to differences 
in methods. 

In using the analysis of variance and covari- } 
ance for testing a null hypothesis, Fisher’s z 
distribution or its non-logarithmic version, Sned- 
ecor’s F distribution, became the proper model 
or criterion for the test of significance of the data. 
With a relatively small number of cases in each 
group, a 5 percent level of significance was set. 
That is, in order to reject the hypothesis that any 
apparent differences between methods were due 
to chance rather than to teaching method, the data 
would have to be such as to be possible on the 
basis of chance probability in 5 percent or less 
instances. 

The secondary design was a 3 x 1 block intro- 
duced to enable comparisons of the pooled con- 
temporary groups to a supplementary lecture- 
demonstration group that used both historical and 
contemporary materials. F-tests were made 
with proper data from the main design to satisfy 
requirements of equality of variance for pooling 
in the secondary design. 





j 


3. The Evaluation Instruments 


As the primary concern in this experimental 
Study of effective use of the laboratory period 
was that of the development of scientific approach- | 
es to problems of everyday living, primary emph- 
asis in testing was placed upon a written and a 
performance test of scientific problem-solving de- 
vised by the writer. 

In the performance test in scientific problem- 
solving, the writer devised fourteen problem sit- 
uations. Since evaluation of scientific problem- 
solving abilities was the objective in this test, 
technical skill and technical knowledge were held 
to a minimum as factors in the items. Three 
problems were invented to measure ‘‘accuracy 
of observation’’; four set-ups, ‘‘determination 
of relevant factors, clues and cues in problem 
Situations’’; and seven situations to measure ‘‘re- 
sourcefulness in organizing relevant data, mater- 
ials and procedures. "’ 

In the administration of the test, each set-up 
was at its own clearly indicated station with its 
own problem question and instructions. In the 
performance test the students were allowed three 
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(Vol. XXI 


warning bell to finish recording their answers. At 
a second bell, the students rotated to the next sta- 
tion, clearly marked. As examples of each ofthe 
three categories of these performance problems 
are the following copies of the direction cards: 


Station 2 
HOW MANY NAILS IN THIS CONTAINER ? 


a) Since you cannot expect to count all these nails 
in a few moments, list the steps in proper or- 


der that you would take in using this scale for 
a fairly close approximation of all the nails. 





b) What would be an assemption if you finally did 
have an approximate count of the nails? 


Station 6 
SYMMETRY OBSERVATION 


Indicate in your answer book the number of lines 
in B that are missing to keep it from being exact- 
ly like A. 


Station 11 
*‘DANCING MOTHBALLS”’ 


On the basis of observation, what is your best 
lead as to why mothballs are able first to rise 
and then to sink in the liquid repeatedly ? 


The first example was one of ‘‘resourcefulness 
in organizing relevant procedures”’ in line with 
the purpose and the materials at hand. In the sec- 
ond example, the symmetrical pattern with its 
flaws was complex enough so that the student had 
to work systematically and quickly inorder to 
obtain an ‘‘accuracy of observation’’ in the time 
allotted. In the third example, careful observa- 
tion alone even without previous knowledge of the 
principle involved would inductively lead to the 
‘‘relevant cue or factor in the situation. ’’ Such 
‘*magic’’ exhibitions as this can occasionally be 
seen as advertising displays in the windows of 
shops. 

In the construction of a written test for scien- 
tific thinking, the writer limited himself to eval- 
uation of the following as a basic and balanced 
pattern in evaluation of scientific, that is, of open- 
minded, systematic and critical thinking: 


1. Ability to determine best leads or best author- 
ity for problem solution. 

2. Ability to select and to organize relevant data 
and procedures. 
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. Ability to interpret data with proper consider - 
ation for suspended judgment, hasty general- 
izations and over-caution. 

. Ability to determine assumptions behind con- 
clusions. 


Open-mindedness of approach could be specifically 
provided for by number (3) above and systemati- 
zation by number (2). Critical abilities were in- 
volved in all four sections. Also, forms most 
suited for measurement of critical thinking and 
attitudes were used as Ralph Tyler’s form of T, 
PT, ID, PF, F for interpretation of data and for 
associated attitudes of hasty generalization, sus- 
pended judgment and over -caution. 

Considerable efforts were made for validity 
of these tests. A select group of graduate stud- 
ents was used to establish an outside criterion 
for the written test that resulted in a validity co- 
efficient of .68. The performance test showed 
a correlation of . 95 to the written test. Compos- 
ite reliability coefficients of .52 and . 60, respec- 
tively, were found for these two tests by the Jack- 
son sensitivity method. 

As pretests, the above written test for scien- 
tific thinking as well as the ACE Psychological 
Test, 1947, were used. The first served to re- 
move the effects of any inequalities of initial prob- 
lem-solving abilities, while the second did the 
same for effects of inequality of general mental 
or academic ability. As secondary tests in sub- 
ject matter content, outside criterion tests were 
introduced during the middle and at the end of the 
study. The mid-study test was found to be unre- 
liable for our particular purpose. The end-study 
test had a reliability of .84. A mid-study written 
laboratory test emphasizing recall and applica- 
tions of laboratory apparatus, methods and con- 
clusion formation had a reliability coefficient of 
.72 by Jackson’s methods. 

All of the measures used were significantly 
reliable at 1% levels. Some reliabilities were 
lower than others. This is due largely to the 
fact that some objectives are more difficult to 
measure than others. The effect of low reliabil- 
ities on the relationships studied will be to atten- 
uate them. This is brought about by the fact that 
the mean square of error, which is used in all 
the tests of significance, is enhanced as compared 
with what it would be were perfectly reliable in- 
struments available. 


4. The Experimental Procedures 


a. The Population. The subjects of the study 
were all students regularly taking the first two 
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quarters of the physical science sequence of the 
Physical World, Natural Science IV-V in the Sci- 
ence, Literature and Arts College of the Univer- 
sity of Minnesota during the fall 1950 and winter 
1951. Of the original eighty-seven students start- 
ing the course, 88 percent were freshmen and 
sophomores, and 66 percent were men students. 
All students attended the same three lecture per- 
iods each week. The laboratory or demonstra- 
tion time was the single two-hour period a week 
in which the differentiation of treatment took place. 

b. Sampling. The grouping procedure had as 
its basis the individual class schedules of all stu- 
dents. Examination of these schedules revealed 
that Tuesdays and Thursdays in each case from 
10:00-12:00 a.m. and 1:00-3:00 p.m. were hours 
in which most students were assignable to labor- 
atory periods with least program conflicts. This 
parallelism of time on Tuesday and Thursday as 
well as symmetry in respect to days of the school 
week was particularly advantageous from the 
standpoint of controls. Thus, although program 
conflicts prevented any possibility of a completely 
random assignment to laboratory periods, a large 
majority or 67 percent of the course population 
were found to have had equal opportunity to be in 
the historical or contemporary groups comprising 
the main study. On a completely random basis, 
that is by the flip of a coin, the Tuesday groups 
were designated as historical and the Thursday 
as contemporary. However, those students who 
could not be accommodated by the four Tuesday 
and Thursday laboratory periods made up a fifth 
Wednesday group. This fifth group was desig- 
nated as the demonstration group and was used 
in the secondary design for supplementary infor - 
mation. 

c. Control of Non-Experimental Factors. Con- 
trol of non-experimental factors were attempted 
by the following measures: 





1. All groups had the same lecture period and 
instructor three times a week. This lecture per- 
iod included lectures, lecture-demonstrations, 
discussions and examinations based on the class 
text, the lectures and the accompanying demon- 
strations. 

2. All groups for all class and laboratory work 
used the same text, Fundamentals of Physical 
Science, by K. B. Krauskopf.5 This text emph- 
asized both historical and contemporary approach- 
es and materials in science, minimizing the pos- 
sibility of any text advantages accruing to one 
group or the other. No laboratory manuals were 
used for any of the groups. 





3. K. B. Krauskopf, Fundamentals of Physical Science (New York: McGraw-Hill, 1948). 
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3. Parallel morning and afternoon groups for 
the historical and the contemporary treatments 
on Tuesday and Thursday prevented any variables 
due to the time of the day. Tuesdays and Thurs- 
days were symmetrical in respect to days of the 
school week. 

4. Use of analysis of covariance permitted 
general aptitude and previous problem-solving 
abilities to operate as they normally would and 
then the removal of any inequalities of their ef- 
fects statistically. 

5. All the four historical and contemporary 
laboratory groups of the main design had the same 
instructor, the present writer, during the labor- 
atory hours. The demonstration group, however, 
had two instructors, Dean J. W. Buchta during 
the first quarter, and the writer all of the second 
quarter except for one meeting. 

6. Records for both quarters showed attend- 
ance to be about equally good for all groups es- 
pecially after the first week or two when the class 
programs and routines became established. 

7. The topics of all groups were the same and 
treated concurrently even though the methods 
were differentiated. 

8. The same problem-solving conceptual out- 
comes were set up as objectives for both the his- 
torical and the contemporary groups and develop- 
ed in class as generalizations. 

9. In the four problem-solving groups, there 
was the same number of problems, fourteen, for 
solution during the two quarters. 

10. The problems and the materials of the 
replicated afternoon groups were a duplication 
of those of the morning groups. 


d. Differentiation of Treatment and Class Pro- 
cedures. Differentiation of methods was essen- 
tially in respect to problem treatment and mater- 
ials. The laboratory courses consisted of four- 
teen contemporary problems parallel to fourteen 
case histories, all of which, in topic, were sim- 
ilar and concurrent to the accompanying demon- 
stration group. The historical groups, however, 
involved historically centered problems and lab- 
oratory equipment; the contemporary groups in- 
volved contemporarily focused problems and equip- 
ment; and the demonstration group was apparatus 
centered in discussions, questions and problems 
with both contemporary and historical equipment. 
For example, in connection with the refraction 
of light, the historical groups worked on the Gal- 
ilean telescope and its significance as empirical 
evidence in the Ptolemaic-Copernican issue of 
‘‘Does the Sun Revolve Around the Earth or the 
Earth Around the Sun?’’ At the same time, the 
contemporary groups were considering their 
own eyes as optical instruments; particular op- 
tical defects that they may have developed, and 
correction by lenses. The demonstration group 
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experienced standard class room refraction dem- 
onstrations. 

Class procedures, consequently, involved, on 
the one hand, historical critical experiment dup- 
lication by individual students with emphasis upon 
the problem-solving involved. On the other hand, 
for the contemporary groups, they involved sci- 
entific resolution and evaluation of class argu- 
ments and discussion by individual experimenta- 
tion that was as much instructor-class planned 
as possible. The class arguments were initiated 
by newspaper clippings, challenging statements, 
quotations, questions, or the like. The labora- 
tory materials and the manipulation were used 
in a larger picture of scientific problem-solving. 
For the demonstration group, the laboratory dem- 
onstrations of the laboratory period were a con- 
centrated re-enforcing supplementation of the 
lecture-demonstrations of the regular lecture 
periods, but one in which the demonstrator pro- 
ceeded from one piece of apparatus to another in 
a systematic and logical development of the pert- 
inent facts, principles and applications of the top- 
ic at hand. Further questions, problems and dis- 
cussions often arose inductively in the processes 
demonstrating the various pieces of apparatus and 
their phenomena. 


5. Analysis of the Experimental Results 


Raw scores existed for two preliminary meas- 
ures, two mid-study tests and three final achieve- 
ment tests already described. These scores be- 
came the primary data for statistical analyses in 
the two designs. Analyses were also made for 
the four separate sections of the written and the 
thrée separate sections of the performance tests 
in scientific thinking. The scores of the prelim- 
inary measures were used for analysis of co- 
variance in order to remove that part of any sig- 
nificant differences between methods that were 
attributable to differences in initial scores. In- 
volved, of course, in each of the analyses for 
each design were the following processes: calcu- 
lation of the sums of squares and cross-products 
of the particular test scores, testing of the equal- 
ity of the within group variances through Welch's 
L-test and Nayers Tables, setting up the analysis 
of variance table, adjusting the sums of squares 
for the variables to be partialled out, applying 
the F-test, testing the homogeneity of the within 
regression coefficients, setting up the analysis 
of variance and covariance tables, and testing 
the null hypothesis by the F-test. Also equality 
of variance between replicated morning and after - 
noon groups became the basis for separately pool- 
ing contemporary and historical groups in the 
3 x 1 secondary design. 

As an example, Table I contains the analysis 
of variance and covariance used in the main2 x2 
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design to determine any significant differences 
between the historical and contemporary groups 
in respect to the entire final performance test 
on scientific thinking. Although on this test the 
contemporary groups had a mean of over four 
points higher than the 31.10 of the historical 
groups, this was not significant. This was shown 
by the F value of 2. 805 which on Snedecor's Table 
is too low for significance at a 5 percent level. 
The F value of 2. 49 for the ‘‘between replica- 
tions’’ variance was also not significant at the 5 
percent level, and resulted in acceptance of the 
null hypothesis that the afternoon replicated 
groups had variances that were homogeneous with 
the variances of the morning groups. The F value 
of 1.591 likewise resulted in acceptance of the 
null hypothesis as to interaction between time of 
day and method. 

When separate analyses were made of each of 
the three separate parts of the above performance 
test on scientific thinking, an F value of 5. 955in 
part 2 only, on ability to detect relevant factors 
and clues in a problem situation was significant 
at a 5 percent level in favor of the contemporary 
method. This significance in part 2 was main- 
tained in the secondary 3 x 1 design that included 
the demonstration group. 

The null hypotheses were accepted in each of 
the two designs on the written scientific thinking 
test, on the two outside criterion tests for science 
subject matter achievement and for the mid-study 
laboratory test. 

In the use of the t-tests of significance to com- 
pare the written pretests to final tests in scien- 
tific thinking, gains by each of the three groups 
were significant between 5 percent to 10 percent 
levels. 


6. Summary and Conclusions 


The study briefly described here was designed 
primarily to investigate the comparative values 
of an historical as against a contemporary prob- 
lem solving use of the college physical science 
laboratory period for general education. Each 
treatment involved using the science laboratory 
period for unique possibilities that it provides in 
the employment of first hand materials and evi- 
dence in scientific problem solving. 

The population consisted of all students, most- 
ly freshmen and sophomores regularly enrolled 
in the Natural Science IV and V sequence of the 
Physical World of physics, astronomy and chem- 
istry at the University of Minnesota during the 
fall 1950 and winter 1951. The original eighty- 
seven students were divided into five groups, two 
historical, two contemporary and one demonstra- 
tion group. All students attended the same three 
lecture periods each week. The laboratory or 
demonstration time was a single two-hour period 
a week. 
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Differentiation of methods was essentially in 
respect to problem treatment and materials. The 
laboratory courses consisted of fourteen contemp- 
orary problems parallel to fourteen case histor - 
ies, all of which, in topic, were similar and con- 
current to the accompanying demonstration group. 
The historical groups, however, involved histor- 
ically centered problems and laboratory equip- 
ment; the contemporary groups involved contemp- 
orarily focused problems and equipment; and the 
demonstration group was apparatus centered in 
discussions, questions and problems with both 
contemporary and historical equipment. 

Statistical treatment was based upon random- 
ized sampling equalization of groups through 
Fisher's analysis of variance and covariance, 
and provided for validity and reliability of meas- 
urement. The replicated historical and contemp- 
orary groups formed a 2 X 2 randomized block 
design. The demonstration group when combined 
with the pooled historical and pooled contempor - 
ary groups formed a secondary 3 x 1 randomized 
block. Preliminary data were obtained through 
the 1947 ACE College Aptitude Test and a pretest 
on scientific thinking compiled by the writer. The 
final tests included (1) the written pretest repeat- 
ed, (2) a ‘‘practical’’ or performance test based 
on fourteen actual problem situations for evalua- 
tion of openminded, systematic and critical think- 
ing, and (3) outside criterion tests on science sub- 
ject matter. 

Within the limits of this study, the conclusions 
are valid that, 


a. The college physical science laboratory 
period can be devoted to general education 
outcomes of scientific problem-solving with- 
out loss in subject matter achievement. 


. There was an advantage to contemporary 
laboratory groups over historical labora- 
tory as well as lecture-demonstration 
groups in respect to determining relevant 
factors and clues in actual contemporary 
problem situations. 


7. Implications and Recommendations 


Among implications and recommendations to 
be drawn are the following: 


a. Further indication here that scientific think- 
ing and scientific approach to problems can 
be learned through direct class room plan- 
ning and procedures for the purpose. 


. Merely a first indication here as to whether 
college teachers can better develop prob- 
lem-solving abilities, attitudes and re- 
sourcefullness for contemporary living by 
appreciatively and realistically duplicating 
and analyzing the problem solving of some 
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of the outstanding scientific thinkers of the 
past, or by working directly with students 
in immediate contemporary problems of 
science in various areas of living. In this 
first indication, the only significant advan- 
tage was in favor of the contemporary- 
laboratory group. 


>, Whatever results we have seen here are 


based upon the one particular historical 
and the one particular contemporary prob- 
lem-solving method developed here. There 
are innumerable variations of such 
approaches to be tried. For example: What 
could be accomplished comparatively with 
historical materials used only to the extent 
that they throw direct light on contempor- 
ary problems? This, of course, would 
eraphasize a developmental historical treat- 
ment that opens up into the modern scene 
rather than a more episodic case-history 
treatment. 


. We further recommend comparative studies 
of historical and contemporary methods for 
general education purposes in whichnot only 
the laboratory periods, but all class activ- 
ities emphasize problem-solving processes. 
We can visualize, for example, a contemp- 
orary problem-solving treatment in which 
there is no set laboratory period, but in 
which laboratory or demonstration apparat- 
us is brought into play right there and then 
whenever necessary or effective as evidence 
in class discussions, disputes or other ac- 
tivities. 


Although a performance test in scientific 
thinking was used, it was still under arti- 
ficial controlled conditions of the class- 
room. There is room for combined, con- 
sistent efforts in further development of 
tests that approximate ever closer, actual 
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living conditions or responses. For ex- 
ample, there might be objective testing 
techniques centered about evaluation of ac- 
tual current newspaper advertising, editor- 
ials or the like. 


In evaluative studies in education, the more 
the experimenter can permit his classes 

to operate under normal teaching conditions 
instead of under controlled conditions typ- 
ical of the experiment only, the more valid 
should be his results. Therefore, further 
development of statistical techniques that 
will enable a larger number of experiment- 
al variables to operate while determining 
and removing their effects from results, 
should be of additional benefit to educational 
research, 


A pertinent question for further study re - 
garding problem solving objectives and 
outcomes in the classroom has reference 

to cumulative educational effects. To what 
extent can a single course or section of a 
course with emphasis on scientific thinking 
procedures and outcomes among other things, 
overcome habits of many years of past 
school experience of students based upon 
processes that involve essentially memor- 
ization and recall? To what extent would 
learning of a problem-solving type be accel- 
erated with an accumulation of problem~ 
solving experience in the students’ educa- 
tional background ? 


. We might pose still another question for 
further study: To what extent and under 
what conditions is learning of a scientific 
problem-solving nature permanent, and 
what classroom procedures are more effec- 
tive in such more permanent learning? 














THE STATISTICAL INTERPRETATION OF 
DEGREES OF FREEDOM 


WILLIAM J. MOONAN 
University of Minnesota 
Minneapolis, Minnesota 


1. Introduction 


THE CONCEPT of ‘‘degrees of freedom’’ has 
a very simple nature, but this simplicity is not 
generally exemplified in statistical textbooks. It 
is the purpose of this paper to discuss and define 
the statistical aspects of degrees of freedom and 
thereby clarify the meaning of the term. This 
shall be accomplished by considering a very elem- 
entary Statistical problem of estimation and pro- 
gressing onward through more difficult but com- 
mon problems until finally a multivariate prob- 
is used. The available literature which is devot- 
ed to degrees of freedom is very limited. Some 
of these references are given in the bibliography 
and they contain algebraic, geometrical, physical 
and rational interpretations. The main emphasis 
in this article will be found to be on discovering 
the degrees of freedom associated with certain 
standard errors of common and useful significance 
tests, and that for some models, parameters are 
estimated directly or indirectly, by certain de - 
grees of freedom. The procedures given here 
may be put forth completely in the system of es- 
timation which utilizes the principle of least 
squares. The application given here are special 
cases of this system. 


2. 


In most statistical problems it is assumed that 
n random variables are available for some anal- 


ysis. With these variables, it is possible to con- 
struct certain functions called statistics with which 
estimations and tests of hypotheses are made. As- 
sociated with these statistics are numbers of de- 
grees of freedom. To elaborate and explain what 
this means, let us start out with a very simple 
situation. Suppose we have two random variables, 
y, and y,. If we pursue an objective of statistics, 
which is called the reduction of data, we might 
construct the linear function, Y, = $y, + + yo. 
This function estimates the mean of the popu- 
lation from which the random variables were 
drawn. For that matter so does any other linear 
function of the form, Y, = a,, y,; + a;2 Y2 where 
the a’s are real equal numbers. When the coef- 
ficients of the random variables are equal to the 
reciprocal of the number of them, the statistic de- 
fined is the sample mean. This statistic may be 
chosen here for logical reasons, but its specifi- 





cation really comes from the theory of estimation 
mentioned before. We also could construct an- 
other linear function of the random variables, Y, 
: re” ; Yea- 

This contrast statistic is a measure of how 
well our observations agree since it yields ameas- 
ure of the average difference of the variables. 
These statistics, Y, and Y,, have the valuable 
property that they contain all the available inform- 
ation relevant to discerning characteristics of the 
population from which the y’s were drawn. This 
is true because it is possible to reconstruct the 
original random variables from them. Clearly, 
Y, = Y2 = y, and Y, - Ypg = yg. We discern that 
we have constructed a pair of statistics whichare 
reduceable to the original variables, but they state 
the information contained in the variables ina 
more useful form. There are certain other char- 
acteristics worth noticing. The sum of the coef- 
ficients of the random variables of Y, equals zero 
and the sum of the products of the corresponding 
coefficients of the random variables of Y, and Y, 
equals zero. That is, (#)(#) + (#)(-#) = 0. This 
latter property is known as the quasi-orthogonal- 
ity of Y, and Y,. This property is analogous to 
the property of independence which is associated 
with the random variables. 

In changing our random variables to the statis- 
tics we have performed a quasi-orthogonal trans- 
formation. Quasi-orthogonal transformations are 
of special interest because the statistics to which 
they lead have valuable properties. In particular, 
if our data are composed of random variables from 
a normal population, these statistics are indepen- 
dent in the probability sense, (i. e., stochastically 
independent) or in other words, they are uncorrel- 
ated. That remark has a rational interpretation 
which says that the statistics used are not over- 
lapping in the information they reveal about the 
data. As long as we preserve the property of orth- 
ogonality we will be able to reproduce the original 
random variables at will. This reproductive prop- 
erty is guaranteed when the coefficients of the 
random variables of the statistics are mutually 
orthogonal (i.e., every statistic is orthogonal to 
every other one), since the determinant of such 
coefficients does not vanish when this istrue, our 
equations (statistics) have a solution which is the 
explicit designation of the original random vari- 
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The determinant for this problem is 


¢ 
(1) = (2-2) - (30) = -#- 0 


\¢ 


ables. 


There is another valuable property of quasi-orth- 
ogonal transformations which we shall come to a 
little later. 


3. 

If we have three observations, we can construct 
three mutually quasi-orthogonal statistics. Again 
we might let Y, be the mean of the random vari- 
ables with Y, and Y, as contrast statistics. Spec- 
ifically, let Y, =¢y, =ty, = ys. There exist 
two other mutually quasi-orthogonal linear sta- 
tistics which might be chosen, and it can be said 
that we enjoy the freedom of two choices in the 
Statistics we actually use to summarize the data. 
We could let 
; Yya- 


(2)¥g=#y. ~t¥2+i Ys; Ys =ty. +t ya - 


or, 


© 
(3)¥Yo"P¥i +2 ¥2-2¥ai Ys =ty. -ty¥2 -F Ys. 
(It can be shown that there exists an infinity of 


possible choices! ) 


Either pair of the statistics which we have 
chosen together with Y, can be shown to reproduce 
the random variables y,, and yg and ys. Asa 
consequence, they possess all the information that 
the original variables do. In general, if we have 
n random variables, we might construct a statis- 
tic representing the sample mean (which estimates 
6) and have n - 1 choices or degrees of freedom 
for other mutually quasi-orthogonal linear statis- 
tics to summarize the data. Each degree of free- 
dom then corresponds to a mutually quasi-orthog- 
onal linear function of the random variables. In 
general, the term degree of freedom does not nec- 
essarily refer to a linear function which is orth- 
ogonal to all the others which are or may be con- 
structed; however, in common usage it usually 
does refer to quasi-orthogonal linear functions. 

When the observational model we are working 
with contains only parameter which is estimated 
by a linear function, there is little purpose in spec - 
ifying the remaining degrees of freedom in the 
form of contrasts. For instance, if our model is 
yj =8 + ej is normally distributed with zero mean 
and variance o?, i.e., N(o,o?), andi=1,..., n, 
we would also like to estimate o?. Unfortunately, 
this parameter is not estimated directly by linear 
functions other than Y,. 

Before proceeding, the other property of quasi- 
orthogonal transformations will be discussed. One 
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might inquire about the relationship of the number 
called the sum of the squares to the yj’s to the 
sum of squares of the Yj’s. If we require this 
number to be invariant, then 

n 

~ - 2 
(4) Zz ¥j= 2 yi. 

j=1 i=l 


For two statistics, we can write in matrix notation, 


Y, é aii yi 

(5) = Y = Ay. 
Y2 ae; 422 Yo 

Then, yf = Y' Y = (ay) '(Ay) = y' A' Ay. 


2 


n 
> - y! 
“1 at 


Now if = ¥; = Y'Y is to equal 
then A'‘A is a two row-two column matrix with 
ones in the main diagonal, i.e., A'A = (i 0) 

o 1 
A matrix, A', which when multiplied by its trans- 
pose, A, equals a unit matrix, then A' is called 
an orthogonal matrix and the yj’s which are trans- 
formed to the Yj’s by this matrix are said to be 
orthogonally transformed. You will notice that 
the matrix of the coefficients of Y, and Y, of sec- 
tion 2 is not an orthogonal matrix since 


~ 


a 
A'A +} 2] 
O 2g 


If the coefficients of the Y’s had been 1/¥2’s 
instead of #’s then A' would be an orthogonal ma- 
trix. Because the matrix of our transformations 
does not fulfill the accepted mathematical defin- 
ition of orthogonal transformations, but one very 
much like them, they are termed, for the purposes 
of this paper, quasi-orthogonal transformations. 
However, it seems unnatural to beginning students 
to define Y, asy, =1_ y, +1 Ye- Actually, for 

v2 


Y, any linear function with positive and equal co- 
efficients would serve as well as Y, itself for they 
would be logically equivalent and mathematically 
reducible to the usual definition of the sample 
mean. If we are to use the common-sense Statis- 
tics, obviously something must be done in order 
to preserve the property (4). One thing that can 
be done is to change our definition of what the sum 
of squares of the jth linear function, Yf, would 
be. Let us define the sum of squares associated 
with the linear function Yj to be 


-+ Anj Yn) 


a 2 
+ an) 


(6) SS(Y;) = (ij Ya +42; Ye t--- 
ayy + adj # ovee 
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Using this definition instead of just the numerator 
of it, property (4) will be preserved. As an illus- 
tration of this formula let j = 1 and 


Yi =tyi+tye+tyYs, then 


(7) SS (¥,) = @¥1 + F¥e + 3 ya)” 
($) 2 + ($) 2 + (#) 2 





n 
or for n random variables, SS(Y,)=(2 yj)?/n 
i=l 
Further, if y, = 24, yg = 18 and ys = 36, then 
SS(Y,) = 2028, and if we use (2), then SS(Y,)=18 
and SS(Y;) = 150. Note that SS(Y,) + SS(Y,) + SS 
(Y,) +SS(Y3) = 2196 and that 


3 
z y? = 247 + 18? + 367 = 2196 
l= 

Thus the sum of squares of the linear function 
equals the sum of squares of the random variables. 
These results can, of course, be generalized to 
the n-variable case. Clearly, the sum of squares 
of the two linear functions Y, and Ys equals the 
total sum of squares of the random variables min- 
us the sum of squares associated Y,, so: 


SS(Y,) = 2 7 


3 
(8) SS(¥,)+ SS(¥;) = Pas 


or, in general, 


(9) SS(¥Y,)+.... + SS(Yp) = 2, y?. — 


Now define the sample variance of a set of lin- 
ear functions as the average of the sums of squares 
associated with the contrast linear functions. We 
see that for the special case where n = 3, our di- 
vision for this average will be 2 because three are 
two sums of squares to be averaged in (8). This 
argument accounts for the degrees of freedom di- 
visor which has been traditionally difficult to ex- 
plain to beginning students in the formula 


1 n 
(10) S? - By - iw". 
n-1 a(n >) i=l 
The statistic Y,, accounts for one degree of free- 
dom in the numerator of the formula for Student’s 
t and the denominator is a function of (10) and is 
associated with n - 1 degrees of freedom. Note 
that it is not necessary to construct the contrast 
degrees of freedom to obtain the sums of squares 


associated with them. 
4. 

The problem just presented is a simple analy- 
sis of variance (anova) type and leads to the test 
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of the hypothesis, 8 = 6. The next logical elab- 
oration would be to consider Fisher's t test of the 
hypothesis 6, = 6,. The observation model is 
Yik = 9 + e@jk, Where i=1,..., nx; k=1, 2and 
€ik are N(o, 0? =o? = 0,®). The orthogonal linear 
functions which estimate the parameters 6, and 
6,, are respectively, 


= + Oo Yn,” 
n, Ne Ng 


+1 Yyn,' +0 Yiat... 


and Y2 =O Vii +.--+O9 Yni+l Vig +.-.+] Yn. 
ny n, Ng Ng 


Then, 
nk 2 
+$8(Yn,4n,) = 2, 2 vik ~SS(¥,)- 


11) SS(Y bee 
(11) SS(Ys) + iz jet 


mM ng 2 

ee on bvin)” (2 yi2) 
SS(Y,) = = iz] j- Yij ” l= = 
n, - Ng 
5 P - 7 
, Wil yit + Ra 





= Ng 
(yi2 - Ye 


and if we average these sum of squares, the ap- 
propriate denominator will be n, + n,~ The 
numerator of Fisher’s tis Y, -Y, under the null 
hypothesis 6, = 6, and the denominator is a func- 
tion of (11) and is associated with n, + ng-2 de- 
grees of freedom. 


5. 

As another example, we might consider the re- 
gression model, yj = 6+ @(xj ~ X) + ej, where 
tS eee n and e; are N(O, oF. x). The linear 
functions of interest are Y, = 1 yp and Y, = (X, -X) 

n n 
.+ (Xp - X) yn- 
n 


For these functions, Y, 


is used to estimate the mean, 6 and Y,, being an 
an average product of the deviation x’s and con - 
comitant y’s, leads to an estimate of the unknown 
constant of proportionality, @. This is rationally 
and algebraically true, since if yj and (xj - x) tend 
to proportionately increase and decrease simul- 
taneously or inversely, Y, will tend to increase 
absolutely. However, if yj and (xj - X) do not pro- 
portionately rise and fall simultaneously or in- 
versely, Y, will tend to be zero. This can be 
shown by the following table. In this table, sev- 
eral sets of x’s designated by xjx, k= 1,...,3, 
each of which have the same mean, 4, are substi- 
tuted in Y, together with their’ corresponding yj's. 
The values of the Ys, are given in the bottom line 
of Table I. 





JOURNAL OF EXPERIMENTAL EDUCATION 


TABLE I 
EVALUATION OF Y, FOR CHANG- 
ING VALUES OF x; IN THE SIMPLE 

REGRESSION MODEL 





LE (xj - X) yi)? 
i 
2 
are 
st *? 


and SS(Y,) = 


Consequently, to find the sample estimate of oF. x 
n 

.+ SS(Yp) = = yf - SS(Y,)- 
- SS(Y, )=i=1 


(13) SS(Ys)+. 


: 2 : : 2 
2 (2,9? (2-99) 


n : n 
z (xj - x)” 
i=l 


n 
(xj - X)yj= 2 (yy - Ya) 
l : : = ol : 


where b is the usual regression coefficient for 
predicting y from a knowledge of x and 7 is the 
predicted value of yj. Again to find the variance 
associated with these sums of squares we divide 
their sums of squares by the number of degrees 
of freedom from which these sums of square were 
derived. This number is n- 2. Under the null 
hypothesis, 6 = 0, the denominator of the t test, 
i = b/S. Ep, has (n - 2) degrees of freedom and 
the numerator is associated with one degree of 
freedom. 


6. 

It is fairly laborious to calculate the SS(Y}) 
and because of this it is desirable to have a meth- 
od whereby the sum of squares associated with 
several linear functions may be conveniently 
found. The proof of the method is fairly long and 
will not be reproduced here. Its exposition will 


have to suffice. 
Let a; be the coefficient vector of the random 


variables of the jth degree of freedom and let y 
be the observation vector, (y, Yz,.--,¥n)- With 
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these values construct the following system of 
equations; 


P,(@,.@,) + Po(ay.ag) +... + Pray. ap) = 


(a,.y) 
(14) 


P, (ag. A, )+P2(ag. ag)+...+ Pp la2- Ap) = (a2 - Y) 


Pi (Ap). 2.) + Po lap. Ag )+- « - + Pry (Am: 2) = (4m. Y) 
When these equations are solved, by whatever 
method is convenient, the sum of squares for the 
m degrees of freedom, Y,, Y2,-.-, Y¥Yp(m<n) 
is given by 

(15) p,(a,.y) + Pelag.¥)+...+ Pm(am- y) 

The method reveals the correct sum of squares 
whether or not the degrees of freedom are mutu- 
ally orthogonal, but we shall illustrate it for the 
orthogonal case. Consider again (2) and then let 


a, =(#, -$, 2), aslt, +, -#) 
and y = (y,, Ye, Ys) = (24, 18, 36). Correspond- 
ing to (14) we have 
(16) pe($) + ps (0) = 3 P2(0) + ps($) = 10. 
Therefore p, = 6 and ps = 15, then SS(Y,)+SS(Y3)= 
6(3)+(15)(10) = 168. In some previous work in 
section 3, we found SS(Y,) =18 and SS(Y;) = 150, 
so this result checks. In this problem, Y, was 
neglected in order to show that (4) is quite general 
for any m<n. 


iA 

All of these principles may be easily general- 
ized to the multivariate case. What is needed is 
to use matrix variables instead of the single ones 
we have been using. Using the Least Squares 
Principle, the ideas presented here (and many 
others) have been applied to multivariate analysis 
of variance in reference number 4. The follow- 
ing and last example is taken from this source. 


Suppose, 
yi =11, yp =5, y# = 8; y? = 2, y? = 6, y? =13. 


Here the superscripts indicate which variate is 
being considered (these numbers are not to be 
confused with powers), and the subscripts desig- 
nate the variables. Also let 


a a a a a 
T’. Bb, 2. My. y- Ya and Y,' = 
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yi, ye. 2y3 
3 3 s ’ 


where a= 1, 2. We have, corresponding to (14) 
Pp; (¢) + pz (0) + pst (0) = 8 


(17) py (0) + pd (3) + ps ($) = 0 
pi (0) + pe (0) + ps ($) = 0 


Therefore, p} = 24, p? = 6, ps; = 0 and using (15) 
we find (24)(8) + (4)(2) + (0)(0) = 210 which is equal 
to 112 + 5? + 8?. For the second variate p? (3)+ 
pz (0) + p# (0) = 7 


(18) p? (0) + p? (+) + py (0) = -2 


p? (0) + p2 (0) + ps (#) = -6 


Solving, we get p? = 21, p? = -4, p# = -9 and 
corresponding to (15), (21)(7) + (-4)(-2) + (-9)(-6)= 
209 which is equal to 2? + 6? = 13%. The sum of 
cross~-products of these three vector.degrees of 
freedom for the two variates may be found inone 
of two ways; either (24)(7) + 6(-2) + 0(-6) = 156 
or (21)(8) + (-4)(3) + (-9)(10) = 156. Both results 
are equal to (11)(2) + (5)(6) + (8)(13). The matrix 


210 156 
156 209 


corresponds to the total sum of squares and cross 
products for the bivariate sample observations 
which have been transformed by the vector de- 
grees of freedom Y;*, j=1, 2, 2. Wenote that 
the sums of squares and cross-products of the 
variables for each variate is preserved by the 
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orthogonal vector set of degrees of freedom. This 
simple problem serves to illustrate this invari- 
ance property for a multivariate case. 


8. Summary 


We have seen that certain statistical problems 
are formulated in terms of linear functions of the 
random variables. These linear functions, called 
degrees of freedom, served the purpose of pre- 
senting the data in a more usable form because 
the functions led directly or indirectly to estimates 
of the parameters of the observation model and the 
estimate of variance of the observations. More- 
over, these estimates may be used to test hypoth- 
eses about the population parameters by the stand- 
ard statistical tests. 

Modern statistical usage of the concept of de- 
grees of freedom had its inception in Student's 
classic work, reference 7, which is often consid- 
ered the paper which was necessary to the devel- 
opment of modern statistics. Fisher, beginning 
with his frequency distribution study, reference 
2, has generalizations to work in their many con- 
tributions to the general theory of regression an- 
alysis. 

This paper has resulted from an attempt to 
bring clarification to the statistical interpreta- 
tion of degrees of freedom. The author feels that 
his attempt will not be altogether successful for 
there remain many questions which students may 
or should ask that have not been answered here. 

A satisfactory exposition could be given by acom- 
plete presentation of the theory of least squares 
which is slanted towards the problems of modern 
regression theory of the analysis of variance type. 
This discussion would appropriately take book 
form, however. 


REFERENCES 


. Cramer, Harald, Mathematical Methods of 
Statistics (Princeton, N.J.: Princeton Un- 
iversity Press, 1946). 

. Fisher, Ronald A., ‘‘Frequency Distribution 
of the Values of the Correlation In Samples 
from an Independently Large Population, ’’ 
Biometrika, X (1915), pp. 507-521. 

. Johnson, Palmer O., Statistical Methods in 
Research (New York: Prentice Hall, Inc. , 
1948). 

. Moonan, William J., The Generalization of 
the Principles of Some Modern Experiment- 

















al Designs for Educational and Psycholog- 
Research. Unpublished thesis, University 
of Minnesota, Minneapolis, Minnesota, 19- 
52. 

5. Rulon, Phillip J., ‘‘Matrix Representation 
of Models for the Analysis of Variance and 
Covariance, ’’ Psychometrika, XIV (1949), 
pp. 259-278. 

6. Snedecor, George W., Statistical Methods 
(Ames, Iowa: Collegiate Press, 1946). 

7. Student, ‘‘The Probable Error of the Mean,’’ 
Biometrika, VI (1908), pp. 1-25. 











264 JOURNAL OF EXPERIMENTAL EDUCATION 


8. Tukey, John W., ‘‘Standard Methods of An- 
alyzing Data, ’’ Proceedings: Computation 


Seminar (New York; International Business 
Machines Corporation, 1949), pp. 95-112. 

9. Walker, John W., ‘‘Degrees of Freedom, ’’ 
Journal of Educational Psychology, XXI 
(1940), pp. 253-269. 





11. 


(Vol. XXI 


for Elementary Statistics (New York: Henry 

Holt and Co. , 1951). 
Yates, Frank, 

Factorial Experiments, Imperial Bureau 

of Soil Science, Technical Communication 


No. 35, Harpenden, England: 1937. 








A SIMPLIFIED X? FORMULA FOR RAPID COM- 
PUTATION OF CERTAIN ITEM-ANALYSIS 
DATA WITH IBM PUNCHED-CARD 
EQUIPMENT 
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ANALYSIS OF test items for their power to 
discriminate between persons in various classi- 
fications is one of the most tedious computation- 
al tasks in test construction. Derivation of bi- 
serial r, for example, requires the computation 
of proportions or of the mean score of all those 
who responded correctly to each item. Withthese 
raw data, biserial r may be read from charts!, 
but the labor of accumulating the raw data cannot 
be avoided. Even if this material is obtained 
from responses recorded on punched cards, each 
set of test responses must first be punched into 
the cards. Machine methods for computing bi- 
serial r are also available.2 Charts and machine 
methods have been developed for computation of 
tetrachoric r,3 and much of the labor of gather- 
ing raw data has been lightened by the develop- 





ment of the graphic item counter attachment of 
the IBM electrical test-scoring machine 4 and by 
machine methods of making the counts of 
frequency within various classifications.5 

The computational problems are multiplied, 
even with these labor-saving devices, when a 
great many sub-classes of the total test popula- 
tion are analyzed, and the accuracy of interpolat- 
ing from published or home-made graphs is cum- 
ulatively reduced when several hundred or thous- 
and such computations must be made. The writers 
faced with an analytic task involving estimation 
of the significance of 45 items in each of twotests 
according to many classifications, as described 
below, have developed what appears to be a very 
simple and practical method of locating items 
which merit more detailed analysis by more re- 
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fined methods. The use of the chi square test of 
significance, as outlined by McNemar and others, 
proved to be practical and rewarding, but the 
large number of computations involved led us to 
develop a simpler form for our purposes. If the 
frequency of correct and incorrect responses to 

a test item are counted within two classifications 
and arranged as 


Wrong Right 
(Upper Q) A B 








(Lower Q) Cc D 














eo ....__ li = Ber (1) 
(A+ B)ic + D)(A + C)(iB+ D) 


Classifications may vary; sex, grade, economic 
Status, IQ-range, etc. , may be used rather than 
quartiles. P is estimated from <* with df = 1, 
and for our purposes the following table is suffic- 
ient: 


Since the test we are analyzing appears to dis- 
criminate somewhat against girls, we can test 
the effect of each item at each grade level for 
each form in contributing to this result. In this 
case, we re-examine every item which discrim- 
inates between sexes to a nonchance degree; it 
may appear, for example, that an item calls forth 
Significant differences in response by boys and 
girls at the twelfth grade level but not at the ninth. 
The general uses of the <* test are well known, 
but some students are discouraged from using it 
extensively when many computations are involved. 
In our case, if we analyze results (total score, 
45 items) in four grades and for the all-grades 
total, for both sexes combined and for each sep- 
arately, and for two forms, we must compute 
45x5x3*x«2 x?'s (i.e., 1350). If we addto 
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this a further analysis according to other class- 
ifications or by sub scores, we increase the com- 
putational labor proportionately. 

Taking advantage of the fact that in this type 
of analysis A+ B=C+D, we reduce the famil- 
iar formula (1) as follows: 





qi+P,=1 





q2+ Pz, =1 














Note that n, = N, = x 


2 . ~<e. , and 
0 
D 


Since 


when df = 1, 


( 


(P, - P)° 
Pq , Pq 


ny Ng 


(BoD)? 


2Pq 
n 
B-D 2 
a. . 
>. (AeC) . (Be) 
| 2 ws 
N 


2 
This reduces to 


N(B - D)? 
~ (A+C)(B+D) 


and since (A+ C) = (N - [B +D)), 





. N(B - D)? 
= W- B+DIB+D) ©? 
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In brief, B and D are obtained with the use of 
the graphic item counter, and N remains constant 
within any given system of classification. The 
number of factors required for computation is 
thus reduced to three. When a large number of 
computations is required, as in our present case, 
it is practical to devise a method for calculating 
the chi square with IBM equipment. The follow- 
ing description applies to the type 602-A calculat- 
ing punch machine but is applicable to any such 
calculator or to common desk calculators of the 
Marchant, Friden, or Monroe type. 

We take advantage of the fact that multiplica- 
tion and division can be done during the same 
program (i.e., in the type 602-A) and break form- 
ula (3) into two parts: 


(B- DXB =D) — (3a) 


x? (3b) 


nS ae 
(N- [B+D)) 


Computation and punching of chi squares requires 
six programs plus read cycle and quotient devel- 
opment cycles. On the average, from ten totwelve 
chi squares per minute may be computed in this 
way. Our 1350 are finished in about 125 minutes. 
Considerable experimentation indicates that the 
breakdowns (3a and 3b) are the most efficient of 
the several possible. Results compare to . 0001 
with those obtained with the full formula on adesk- 
type calculator, and quotient size is the least that 
may be programmed. 

Detailed machine procedures are as follows: 


1. Cards containing score data are sorted on the 
proper columns and placed in ascending order. 
Cards are then selected according to the clas- 
sification to be used. (N cards may be select- 


ed from each end of the deck with the IBM col- 
lator if quartiles are to be used.) If sex dif- 
ferentiation were being analyzed, an equal ran- 
dom sample of each could be selected, or else 
upper and lower quarters within the sex group- 
ing. 

. Names and/or numbers of test papers in each 
selected group are then tabulated on the type 
405 accounting machine, and the test sheets 
are selected and grouped accordingly. 

. If test responses were made on electrically 
scored answer sheets, the graphic item count- 
er may be used to count frequency of correct 
responses within each grouping. 
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4. The number of correct responses in the upper 
quarter (cell B) and in the lower quarter (cell 
D) are then punched into cards with the proper 
item number, identifying data, etc. The N 
(total of both classifications) can be gang- 
punched or duplicated for an entire set or sub- 
set of cards. 

. Calculator programming (see diagram) 


READ cycle: N and D are stored; B is added 
in counter 4, subtracted in 5, and added 
beside N (if neither exceeds three digits) 
in counter 6. 


Program 1: D is subtracted in counter 4, added 
in 5, and added to the B already in 6. A 
negative balance impulse will be avail- 
able at counter 4ifD>B. This NBtrans- 
fers pilot selector 2. 

Program 2: B + D is read out of the left side of 
counter 6 through co-selector 6 (normal) 
and stored as the divisor for (3a). |B-D| 
is read out (counter 4 if no NB in pro- 
gram 1, counter 5 if NB) and added in 
coupled counters 1, 2, and 3 as the (3a) 
dividend; it is also stored as the multi- 
plicand in (3a). 

Program 3: Divide hub is impulsed, B + Dis read 
out of divisor storage and subtracted 
from counters 1, 2, 3 in the usual way. 
Instead of emitting from the QUOT hub 
(J -23), however, we read |B -Dj out of 
storage as a multiplicand for which the 
developing quotient is the multiplier, and 
the product « is developed in counters 
7 and 8. 

Program 4: B + D is subtracted from N in counter 
6, and counters 1, 2, 3 are reset. 

Program 5: The divisor for (3b) is read out of 
counter 6 through the transferred side 
of co-selector 6 and into divisor storage. 
The new dividend, «, is read out of 
counters 7 and 8 and into counters 1, 2, 
3; counters 7, 8 are reset, and the prop- 
er column is reset to 5 for half adjust- 
ment. 


Program 6: (See program 3) The new multipli- 
cand is N, and the product, chi square, 
develops in counters 7, 8. READ hub 
is impulsed, and chi square is read out 
of 7,8 into punch storage for punching 
during the reading of the next card. If 
pilot selector 2 was transferred during 
program 1, (D>B) its punch control ex- 
it will become active just prior to punch- 





7. T. C. Kelley, "The Selection of Upper and Lower Hh for the Validation of Test 


Items,” Journal of Educational Psychology, XXX (1959 


pp. 17-24. 


C. W. Vickery, "On Drawing a Random Sample from a Set of Punched Cards,” Journal of 
the Royal Statistical Society (Supplement), VI (1939), pp. 62-66. 
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ing and will transfer pilot selector 4. 
When transferred, PS 4 allows a com- 
bination of a nine and X-punch tobe 
punched in the column immediately fol- 
lowing those in which chi square is 
punched. This ‘‘R’’ can later be used 
to identify those items which discrimin- 
ate against those in classification B (i.e., 
has ‘‘reverse’’ meaning). The punched 
cards are then interpreted or tabulated 
for purposes of study. 


It must be recalled that the chi square formula 
does not require any a priori arrangement of data 
in the cells. Thus the derived chi square for the 
following two items would be identical, but the 
second would have a ‘‘reverse’’ significance. 


2. 
Wrong Right Wrong Right 
Upper Q {| 20 110 110 20 





























Lower Q/ 95 35 35 95 








Once a satisfactory machine routine has been 
worked out, the calculation of chi squares for 
other analytic purposes can be accomplished 
quickly and accurately. 

The selection of five decimal places in the div- 
idend unit, as shown in the diagram, is the re - 
sult of several trials. Use of four or less de- 
creases the number of significant figures in the 
x? and more than five produces a final result of 
more than the required number of significant fig- 
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ures (see table of P, above).8 As a general rule, 
the number of digits in N, plus two digits for the 
number of significant figures right of the <? dec- 
imal, equals the number of digits to allow to the 
right of the decimal in counters 1, 2, 3. 


Summary 


When (A + B) = (C + D), i.e., when (A + B) 


and (C + D) each equal y, and it thus becomes 


possible to calculate <? from formula (3), the 
simple procedure outlined above has proved of 
considerable value to us in locating items which 
require detailed examination for such purposes 
as these: 


. Analysis of the possible contribution of each 
item to test reliability. 

. Estimation of the validity of each item interms 
of various classifications of purpose or inter- 
est. 

. Analysis of the possible bias inherent in each 
item (e.g., against eithér sex, against child- 
ren in a certain IQ range or of a particular 
age group, etc. ). 

. Identification of those items which may yield 
valid or significant measures of differences 
within various possible sub-classes of the test 
population and not within others. 


Formula (3), once again, cannot be used if 
either (A + B) or (C+D)=%. 





8. The cumulative effect of dropped decimal remainders through two divisions and two 
multiplications is nagnified when the quantities are small; precision becomes more 


important as X2 becomes smaller. 








THE EDUCATION FACTOR IN PERSONALITY 
APPRAISAL’ 


SISTER MARY AMATORA, 0O.S.F. 
St. Francis College 
Ft. Wayne, Indiana 


AWARENESS OF the significance of the impact 
made by the teacher on the pupil was scarcely con- 
sidered, much less investigated several decades 
ago. Throughout the past ten or fifteen years it 
has been refreshing to note that research workers 
have evidenced increased interest in this vital 
component of the learning situation, namely, 
teacher-pupil relationships. Nonetheless, many 
facets of the problem still await more extensive 
and intensive experimentation. One such study 
is herewith presented. 

The subjects used in the present investigation 
included 485 teachers and 1542 elementary school 
pupils from grades four through eight inclusive. 
Schools were selected on a representative basis, 
including small and large schools, rural and ur- 
ban, private and public schools. All are located 
in the state of Indiana, though selection was made 
from various parts of the state and from various 
socio-economic groups. 

The measuring instrument used by the teach- 
ers for appraising the personality of their pupils 
was the Child Personality Scale (3). This scale 
utilizes the rating technique on a ten-point con- 
tinuum on each of twenty-two separate personal- 
ity traits. All scales were administered by the 
writer. 


Analysis of Data 





The purpose of the present analysis of the da- 
ta was to ascertain whether or not there be differ- 
ences in the teachers’ judgments of the person- 
alities of their pupils relative to the educational 
level attained by the teacher. Accordingly, the 
papers were divided into four groups: In Group 
I were placed the papers of all the children rated 
by teachers who had had two years or less of col- 
lege education. Fifteen percent of the teachers 
fell in this category. Group II included all those 
teachers who had had three years of college edu- 
cation. This took another twenty percent of the 
teachers. Atl those teachers who had had four 
years of college education comprised Group III, 
the largest group of the present study, or forty- 
five percent of the total number of teachers stud- 





ied. ine last group, Group IV, included all tea- 
chers who had had five or more years of college 
education. Here again twenty percent of the tea- 
chers in this investigation were listed. 

The equating of groups on the basis of N would 
have obviated some forty percent of the papers. 
Hence it was deemed inadvisable for present pur- 
poses. 

The listing of four or of five years of college 
education does not necessarily imply a bachelor’s 
or a master’s degree. Some, particularly the 
older members of the groups, may have had simply 
an accumulation of course credit haurs achieved 
from a number of colleges or universities through- 
out the United States during summer sessions. 
This additional factor is not analyzed in the pres- 
ent study. 

The ‘‘t’’ technique was used to study the way 
in which each group differed from each other 
group in each of the twenty-two traits on which 
the children were rated. 

Group I— This group of teachers with the least 
amount of education seems to have shown up the 
least well in its ratings of its pupils. In most 
instances the other group with which it was com- 
pared showed higher ratings. In their ratings of 
boys, these teachers exceeded the ratings of other 
groups only nine times out of the total of 66 ‘‘t’’ 
values. In their ratings of girls, they exceeded 
those of the other groups only five times. In rat- 
ings of the sexes combined, they exceeded the 
others five times. On none of these values was 
the ‘‘t’’ sufficiently high to indicate statistical 
significance. 

On the other hand, Group II exceeded this group 
on 18 items, Group III on 20 items, and GroupIV 
on 19 items in ratings of boys. In the ratings of 
girls, the number of exceedings were 20, 19, and 
22 respectively; and in the ratings of sexes com- 
bined, they were 20, 19, and 20 respectively. Of 
the latter, five of the ‘‘t’’ values were statistic- 
ally significant at the one percent level of confi- 
dence, and thirteen others were statistically sig- 
nificant at the five percent level. In the separate 
ratings for boys, two were significant at the one 
percent level, and four more so at the five per- 





*This paper was given at the convention of the american Psychological Association, 


Chicago, September 1, 1951. 


The research was aided in part by a Grant-in-Aid from 


the Psychological Corporation for which ecknowledgement is hereby made. 
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cent level. In the ratings of girls, the differences 
were still more marked, in that seven ‘‘t’’ values 
were Statistically significant at the one percent 
level and ten more were so at the five percent 
level. 

The traits on which the greatest differences 
were found were sociability, punctuality, thought- 
fulness, boisterous-quietness, disposition, neat- 
ness, honesty, courtesy, and patience. Inthe 
order here given the other groups consistently 
rated the children higher on these traits. On 
most of the other traits they also rated them high- 
er than did Group I, but the differences were be- 
low the five percent level. On only two traits did 
the first group rate the children higher, namely, 
on religiousness and nervous-calmness; and these 
**t’’ values were less than 1. 000 in all cases. 

Groups II, Ill and IV—These three groups 
might be considered together, as the differences 
in all cases were small. In no case did any ofthe 
‘*t’s’’ reach a Statistical significance of five per- 
cent level. Yet, there is some meaning in the low 
but consistent values. In the analysis of the rat- 
ings of boys, it was found that the third group rat- 
ed them higher oftener than the others. In the 
total of 132 ‘‘t’’ values (Table I), this group ex- 





ceeded the other 51 times, whereas Group II ex- 
ceeded the others 37 times and Group IV did so 
33 times. 

In the ratings for the girls (Table II), the sit- 


uation was somewhat reversed, in that Group III 
exceeded the others only 29 times, whereas Group 
II did so 42 times and Group IV did so in 54cases. 
This would seem to indicate a definite preference 
of Group III for the boys whom they teach, with 
the situation reversed for Groups II and IV. 

In studying the analysis of the combined ratings 
of boys and girls (Table III), the individual differ- 
ences drop out. Here Group II exceeds the others 
38 times; Group III does so 41 times, and Group 





AMATORA 275 


IV does so 40 times, in the total of 132 ‘‘t’’ val- 
ues. 


Conclusions 


Though differences among the individual groups 
are not great, and often not statistically signifi- 
cant, yet the consistency with which they dooccur 
is revealing. 

The over-all picture presented in this piece 
of research does seem to point to the value of 
higher educational levels for the teacher if she 
is to have a better understanding of the personal- 
ities of her pupils. At least for the group stud- 
ied, the evidence is quite convincing that the tea- 
cher with two years or less of college education 
does not understand her pupils as well as do her 
colleagues with greater educational advantages. 

In the grand total of 396 ‘‘t’’ values computed 
for this study, the teachers with the least educa- 
tion scored favor only 19 times. The teachers 
with three years of college education won favor 
on the critical values in 117 cases; the teachers 
with four years of college education scored high- 
er in 121 cases; and the teachers with five years 
of college education had the differences in their 
favor 127 times. The last three mentioned fig- 
ures are not greatly dissimilar; yet there is a 
notable progression in favor of the increased lev- 
el of education possessed by the teachers. 

Hence, if this group of teachers be represent- 
ative of elementary school teachers, then one 
might strongly emphasize the need of having more 
and adequate education for many teachers in the 
elementary schools. This, tending to facilitate 
pupil-teacher relationships and a better under- 
standing of the children on the part of their teach- 
ers, would greatly improve the learning situation 
for the children who are being educated. 
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